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Methods for the Identification of Reporter and Target Molecules Using 
Comprehensive Gene Expression Profiles 



TECHNICAL FIELD OF THE INVENTION 
The present invention relates to methods of identifying genes whose 
5 expression is indicative of activation of a particular biochemical or metabolic pathway 

or a common set of biological reactions or functions in a cell ("regulon indicator 
genes"). The present invention provides an example of such an indicator gene. The 
present invention also relates to methods of partially characterizing a gene of unknown 
function by determining which biological pathways, reactions or functions its 
10 expression is associated with, thereby placing the gene within a functional genetic 

group or "regulon". These partially characterized genes may be used to identify 
desirable therapeutic targets of biological pathways of interest ("regulon target 
genes"). The present invention provides examples of such target genes. Methods for 
identifying effectors (activators and inhibitors) of regulon target genes are provided. 
15 The present invention also provides examples of regulon target gene inhibitors. 

BACKGROUND OF THE INVENTION 
The sequencing of the S. cerevisiae genome marked the first complete, 
ordered set of genes from a eukaryotic organism, and revealed the presence of over 
6,000 genes on 16 chromosomes (Mewes et al., 1997, Goffeau et al., 1996). The 
20 DNA sequence revealed the presence of 6275 known and hypothetical open reading 

frames (ORFs) encoding putative proteins longer than 99 amino acids in length. Based 
upon codon usage, which can serve as a predictor of whether or not an ORF is actually 
expressed, there are currently thought to be 6222 expressed ORFs (Cherry et al, 
1997). 
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The sequence of the roughly 6,000 ORFs in the yeast genome is 
compiled in the Saccharomyces Genome Database (SGD). The SGD provides Internet 
access to the complete genomic sequence of S. cerevisiae, ORFs, and the putative 
polypeptides encoded by these ORFs. The SGD can be accessed via the World Wide 
5 Web at http://genome-www.stanford.edu/Saccharomyces/ and 

http://www.mips.biochem.mpg.de/mips/yeast/. A gazetteer and genetic and physical 
maps of S. cerevisiae is found in Mewes et al., 1997 (incorporated herein by 
reference). References therein also contain the sequence of each chromosome ofS. 
cerevisiae (incorporated herein by reference). 

10 Having the complete DNA sequence of yeast available creates an 

opportunity to take a collectivist, rather than a reductionist, view on biology. We have 
developed a new technology that enables the simultaneous measurement of gene 
expression across an entire genome The Genome Reporter Matrix™ (GRM) is a 
matrix of units comprising living yeast cells, the cells in each unit containing one yeast 

1 5 reporter fusion (GRM construct) representative of essentially every known 

hypothetical ORF of 5. cerevisiae. See U.S. Pat Nos. 5,569,588 and 5,777,888. A 
GRM construct comprises the promoter, 5' upstream untranslated region and usually 
the first four amino acids from one of each hypothetical ORF fused to a gene encoding 
an easily assayed reporter, such as green fluorescent protein (GFP), luciferin, or 0- 

20 galactosidase. For a few GRM constructs, one to ten of the first amino acids from a 

hypothetical ORF is fused to the reporter. In addition, for those ORFs that have an 
intron, the entire first exon and the usually first four amino acids of the second exon 
are fused to the reporter. The GRM constructs are able to reveal changes in 
transcription for each hypothetical ORF in response to specific stimuli. In addition, the 

25 GRM constructs are able to reveal changes in mRNA splicing, translation and protein 

stability in those cases in which the N-terminus of the protein is sufficient for 
regulation. 

The GRM provides an unprecedented view into the compensatory 
changes a cell makes in the face of a changing environment. Such environmental 
30 changes may be in the form of pH, salinity, temperature, osmotic pressure, nutrient 
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availability, as well as biochemical perturbations caused by xenobiotics, pharmaceutical 
compounds and mutation. Identifying the compensatory changes a cell makes in 
response to exposure to a chemical can provide insight into the biological target of the 
chemical. For example, treatment of the GRM with the cholesterol-lowering drug 
lovastatin causes the cells to become depleted for sterols and non-sterol isoprenoids. 
The yeast cells respond by significantly up-regulating the genes encoding sterol 
biosynthetic enzymes and thus synthesizing more of the enzymes that make sterols. 
One may identify those genes that are involved in sterol biosynthesis or in related 
metabolic pathways by assaying the GRM. Because natural selection operates on a 
selected outcome rather than on a particular molecular mechanism, gene expression 
profiling strategies that detect regulatory changes through several molecular 
mechanisms contribute to a fuller view of how regulatory circuits have evolved. 

An understanding of the regulatory circuits of yeast serves two 
purposes. On the one hand, yeast is an ideal model system for eukaryotic cells, 
including mammalian cells. Therefore, an understanding of the metabolic pathways of 
yeast can be used to design or discover drugs for use in plants and animals, including 
humans. On the other hand, yeast possess certain metabolic pathways and genes which 
are unique to yeast. An understanding of the differences between yeast and higher 
eukaryotes will permit the design and discovery of antifungal drugs that target genes 
and metabolic pathways specific to yeast. See U.S. Serial No. 60/127,272, filed 
concurrently herewith. 

Yeast cells are eukaryotic and have many pathways that are similar or 
identical to those of mammalian cells. However, because yeast cells are unicellular, 
they are easier to manipulate experimentally and the results of such manipulations are 
easier to determine. Thus, yeast serves as an ideal model system for eukaryotic cells, 
including mammalian cells. The deduced protein sequences of the yeast genome 
display a significant amount of sequence identity with mammalian proteins. About 
one-third of the yeast ORFs, when aligned with their mammalian counterparts, produce 
a P-value score of less than 1 x 10" 10 (Botstein et al., 1997). This number may in fact 
be a significant underestimate because the alignments were done with GenBank entries 
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that make up only about 10-20% of the unique human protein sequences thought to 
exist. 

The evolutionary conservation between yeast and humans is not limited 
to sequence identity. The list of human genes that can functionally substitute for their 
yeast counterparts is extensive. For example, H-Ras (Kataoka et al., 1985), HMG- 
CoA reductase (Basson et al., 1988) and the heme A:farnesyltransferase (Glerum and 
Tzagoloff 1994) have been shown to functionally replace their yeast counterparts. 
Researchers have utilized this evolutionary conservation to clone mammalian genes 
through their ability to complement the corresponding yeast mutants. Two examples 
include CDC2 (Lee and Nurse, 1987) and CDK2 (Elledge and Spottswood, 1991). 

Functional conservation between yeast and humans may be best 
illustrated by the notable lack of antifungal therapeutic agents available for safely 
treating systemic infections in humans. Antifungal agents certainly exist, but they are 
characterized by profound side effects likely caused by inhibition of the mammalian 
counterparts of the yeast target. L659,699, lovastatin, and zaragozic acid inhibit 
different steps in the yeast sterol pathway (HMG-CoA synthase, HMG-CoA reductase, 
and squalene synthase, respectively). These inhibitors are also potent inhibitors of the 
corresponding mammalian enzymes (Correll and Edwards, 1994). In addition, we have 
found that in experiments with over 100 pharmaceutical agents used to treat a variety 
of distinct clinical indications in mammals, approximately 80% produced significant 
changes in gene expression in the GRM, indicating that there is substantial overlap in 
drug specificity between mammalian and yeast systems. 

Yeast also contain genes that encode proteins that do not have plant 
and/or animal homologs. These non-homologous genes may be used as targets for the 
design and discovery of highly specific antifungal agents for use in plants and animals, 
including humans. The GRM may be used to identify genes that are expressed in 
particular metabolic pathways. Non-homologous genes in a pathway of interest may 
be used as targets for design and discovery of antifungal agents, for instance. See, 
e.g., U.S. Serial No. 60/127,272, filed concurrently herewith. 

One metabolic pathway of interest for identification of both 
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homologous and non-homologous genes is the pathway for synthesis of isoprenoids. 
Eukaryotic cells utilize a group of structurally related compounds, the isoprenoids, for 
a vast array of cellular processes. These processes include structural composition of 
the lipid bilayer, electron transport during respiration, protein glycosylation, tRNA 
modification, and protein prenylation. All isoprenoids are synthesized via a pathway 
known variously as the isoprenoid pathway, mevalonate pathway, or sterol 
biosynthetic pathway. Although the bulk end product of the pathway is sterols, there 
are several branches of the pathway that lead to non-sterol isoprenoids. Due to the 
involvement of isoprenoids in a variety of physiologically and medically important 
processes, a comprehensive understanding of the regulation of this pathway would 
offer many scientific and practical benefits. 

The regulation of the isoprenoid biosynthetic pathway is known to be 
complex in all eukaryotic organisms examined, including S. cerevisiae. The overriding 
principle for the regulation of this pathway is multiple levels of feedback inhibition. 
This feedback regulation is keyed to multiple intermediates and appears to act at 
numerous steps of the pathway, involving changes in transcription, translation and 
protein stability. Additionally, the availability of molecular oxygen, required for sterol 
and heme biosynthesis, also regulates the expression of genes at key steps of the 
pathway. The emerging picture is that the isoprenoid pathway has numerous points of 
regulation that act to control overall flux through the pathway as well as the relative 
flux through various branches of the pathway. 

Given the complexity of the isoprenoid pathway, it can be difficult to 
understand the regulation of any one step of this pathway, unless it is viewed within 
the context of the entire pathway. Thus, the GRM is ideal for understanding the 
regulation of the isoprenoid pathway because one may observe the regulation of all the 
yeast genes involved in the isoprenoid pathway at one time by using the GRM. In 
addition, analysis of the gene expression provided by the GRM (preferably using 
software described below) may provide information about which particular genes in the 
isoprenoid pathway are important regulatory genes in the pathway, those which are 
important indicator genes of the isoprenoid pathway, and those which are suitable 
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targets to regulate isoprenoid synthesis. 

Today we have the luxury of reflecting upon the wealth of information 
that has come from decades of research into the cell biology and genetics of yeast. 
Still, less than 20% of the hypothetical ORFs discovered by the yeast genome project 
5 had been previously identified through basic research (Goffeau et al., 1996). 

Additionally, 25% of the yeast ORFs with obvious human homologs have no known 
function (Botstein et al., 1997). The situation will likely be the same when the human 
genome sequence is completed. 

Several research groups have created software programs that enable the 

10 comparison of both chemical and genetic expression profiles to identify related gene 

expression response patterns, as shown, for example, in Figure 38. In addition, 
expression changes of individual genes in response to any given treatment can often be 
accessed through hypertext links Currently, our software will: 1) normalize 
expression data; 2) rank changes in individual gene's expression relative to a particular 

15 treatment; 3) rank similarities between genomic expression profiles as a result of a 

chemical or genetic treatment; and 4) determine the correlation coefficient for an 
individual gene's expression relative to that of all other genes to identify regulons, or 
groups of genes that share the same regulatory programs. See United States 
Application 09/076,668, now pending; Eisen et al. (1998), and Tamayo et al. (1999). 

20 The ability to assign ORFs to functional groups based upon their 

expression patterns will provide valuable information pertaining to the function of 
proteins from model organisms as well as their mammalian counterparts. Analysis of 
genomic expression patterns may also reveal upstream regulatory sequences, including 
promoters, with great utility for regulated or constituitive expression of recombinant 

25 genes. Such regulated sequences can be used for making reporter constructs for any 

selected process intrinsic to a given genome. 

These functional genomics studies will provide a great deal of 
information that can implicate yeast genes, as well as their mammalian counterparts, in 
a variety of cellular functions. Associations of particular genes with specific biological 

30 pathways will be made by virtue of the genes' patterns of regulation under numerous 
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conditions. 

One particular problem in the prior art has been identifying genes whose 
expression is representative of a specific biological (e.g., metabolic) pathway. One 
would like to be able to measure the expression of a gene or its encoded protein to 
5 indicate the effect of a particular treatment on a specific pathway. Thus, there is a 

need for various pathway indicator genes for the various metabolic pathways. 

A second problem in the prior art has been identifying genes and their 
encoded proteins which can be efficient targets within a specific biochemical pathway 
or set of associated pathways. Once good targets have been identified, pharmaceutical 
10 compounds and treatments may be designed or discovered to regulate the expression 

or activity of the target gene or protein. 

SUMMARY OF THE INVENTION 
The instant invention addresses the above problems by providing a 
method using genomic arrays, such as the GRM or hybridization arrays, for identifying 
1 5 indicator genes that are specific for particular biochemical pathways and sensitive to 

perturbations of these pathways. The instant invention provides one such gene, HES1, 
which is an indicator for the isoprenoid metabolic pathway. The invention provides the 
polynucleotide sequence of HES1 and vectors and host cells comprising this sequence. 
The invention also provides a method of producing HES1 recombinantly. The 
20 invention further provides methods of using HES1 as a specific indicator of the state of 

the isoprenoid pathway to identify compounds that regulate that pathway. 

The instant invention also provides a method for identifying targets for 
one or more biochemical pathways of interest using the GRM or other types of 
genomic arrays, such as hybridization arrays. The instant invention also provides a 
25 number of ORFs and their encoded proteins which are targets for lipid metabolism, 

yeast morphology, RNA metabolism and growth control. These ORFs include 
YMRJ34W, YER034W, YJLJ05w, YKL077w, YGR046w, YJR041c, YER044c and 
YLRIOOw and their encoded proteins. 

The invention provides the polynucleotide sequences of these ORFs and 
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vectors and host cells comprising these ORFs for use in methods of identifying, 
designing and discovering highly specific anti-target agents. Specific anti-target agents 
include antisense nucleic acid molecules that target YMRJ34w, YER034w, YJLJ05w, 
YKL077w, YGR046w, YJR04Jc, YER044c and YLRlOOw and ribozymes that cleave 
5 RNAs encoded by these ORFs. The invention also provides a methods of 

recombinantly producing the protein encoded by these ORFs for use as a target in 
methods of identifying, designing and discovering highly specific antifungal agents and 
for producing antibodies directed against the encoded protein. Specific anti-target 
agents include antibodies that bind to the protein encoded by YMR134w, YER034w, 
10 YJLJOSw, YKL077w, YGR046w, YJR041c, YER044c and YLRlOOw and small organic 

molecules that bind to and inhibit proteins encoded by these ORFs. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1. Summary of Characteristics for YJL105w. 
Figure 2. Plot of changes in expression of YJL105w and CYB5 in 
15 response to different chemical treatments. Each point represents the expression 

changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. CYB5 functions in sterol 
biosynthesis through its activation of the Ergl lp NADPH-cytochrome P-450 
reductase. 

20 Figure 3. Regulated Expression of YJLJ05w. YJL 1 05w is significantly 

induced by isoprenoid biosynthetic inhibitors and mutations in HMG-CoA synthase 
(hmgs). "Log Ratio" refers to the natural log ratio of treated/untreated expression 
values. 

Figure 4. Effects of lovastatin on wild-type and YJL105w knockout 
25 yeast strains. 10 ul of a 25 mg/ml solution of lovastatin (250 ug) in ethanol was 

applied to a sterile drug disk on a lawn of yeast (5 x 10 6 cells, ABY363). The plates 
were incubated overnight at 30 °C. 

Figure 5. Summary of Characteristics for YMR134w. 

Figure 6. Plot of changes in expression of YMR134w and ERG2 in 
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response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. ERG2 encodes sterol isomerase. 
Figure 7 Treatments Causing Highest Expression of YMR134w. 
5 YMR134w is induced most significantly by inhibitors of the isprenoid biosynthetic 

pathway. 

Figure 8. Database Searches with YMR134w. Database searches with 
YMR134w did not reveal any apparent mammalian counterparts. 

Figure 9 Summary of Characteristics for YER044c. 
1 0 Figure 10. Plot of changes in expression of YER044c and ERG2 in 

response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. 

Figure 11. Treatments Causing Highest Expression of YER044c. 
15 YER044c is induced most significantly by inhibitors of the isprenoid biosynthetic 

pathway. 

Figure 12. Database Searches with YER044c. Database searches with 
YER044c reveal numerous mammalian expressed-sequence tag (EST) apparent 
counterparts. 

20 Figure 13. Comparison of the YER044c Predicted Protein Sequence 

with Mouse and Human EST Translations. 

Figure 14 Comparison of the YER044c Predicted Protein Sequence 
with Rat EST Translation. 

Figure 15. Summary of Characteristics for YLRlOOw. 
25 Figure 16. Plot of changes in expression of YLRlOOw and CYB5 in 

response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. 

Figure 17. Treatments Causing Highest Expression of YLRlOOw. 
30 YLRlOOw is induced most significantly by inhibitors of isprenoid biosynthesis and a 
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mutation in the gene encoding Ergllp. 

Figure 18. Database Searches with YLRlOOw. Database searches with 
YLRlOOw reveal numerous mammalian expressed-sequence tag (EST) apparent 
counterparts. 

5 Figure 19. Alignment of YLRlOOw to Mammalian ESTs. 

Figure 20 Summary of Characteristics for YER034w. 
Figure 21 Plot of changes in expression of YER034w and GPA2 in 
response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
10 indication of the level of coordinate gene expression. Gpa2p, encoded by GPA2, is the 

alpha subunit of a trimer G-protein involved in pseudohyphal growth. 

Figure 22. Mutation of the YER034w Gene Leads to Increased 
Pseudohyphal Growth. Cells were plated onto low nitrogen plates (0.5% agarose, 2% 
glucose, 0.34% yeast nitrogen base without amino acids and ammonium sulfate, 
15 0.05mM ammonium sulfate, 20 pg/ml uracil, 30 ug/ml leucine, and 5 ug/ml histidine) 

and incubated for four days at 25 °C. Bar height represents the average number of 
hyphal projections per colony (n=20). 

Figure 23. Summary of Characteristics for YKL077w. 
Figure 24. Plot of changes in expression of YKL077w and SGV1 in 
20 response to different chemical treatments. Each point represents the expression 

changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. SGV1 is a Cdc28p-related 
protein kinase that is essential for yeast viability. 

Figure 25. Expression Correlation of YKL077w. Expression of the 
25 YKL077w gene correlates with that of genes involved in cell wall integrity and 

cytoskeletal reorganization. 

Figure 26. Database Searches with YKL077w. Database searches with 
YKL077w did not reveal any apparent mammalian counterparts. 

Figure 27. Summary of Characteristics for YGR046w. 
30 Figure 28. Plot of changes in expression of YGR046w and IRA2 in 
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response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. IRA2 encodes a GTPase- 
activating protein for Raslp and Ras2p. 
5 Figure 29. Expression Correlation of YGR046w. Expression of the 

YGR046w gene is correlated to other genes involved in growth control. 

Figure 30. Treatments Causing the Most Significant Changes in 
Expression of YGR046w. Expression of YGR046w is sensitive to agents that perturb 
mitrochondrial function, create oxidative stress and disrupt the cytoskeleton. 
10 Figure 31. Summary of Characteristics for YJR041c. 

Figure 32. Plot of changes in expression of YJR041c and MED7 in 
response to different chemical treatments. Each point represents the expression 
changes in a given chemical treatment. The fitness of the points to a line provides an 
indication of the level of coordinate gene expression. MED7 is a component of the 
15 mediator complex involved in RNA Polymerase II transcription. 

Figure 33. Expression Correlation of YJR041c. Expression of 
YJR041c is correlated to genes involved in RNA metabolism including RNA 
polymerase 1 and II transcription, mRNA splicing and turnover and ribosome function. 

Figure 34. Database Searches with YJR041c. Database searches with 
20 YJR041c did not reveal any apparent mammalian counterparts. 

Figure 35. Summary of Characteristics for HES1. 

Figure 36. Expression Correlation of HES1. 

Figure 37 Treatments that Induce the HES1 Reporter. Inhibitors of 
the isoprenoid biosynthetic pathway cause a significant induction of the HES1 reporter. 
25 Figure 38. Browser Interface of Acacia's Expression Software. 

Figure 39. YJL105w DNA Sequence. 

Figure 40. YJL105w Protein Sequence. 

Figure 41. YMR134w DNA Sequence. 

Figure 42 YMRJ34w Protein Sequence. 
30 Figure 43. YER044c DNA Sequence. 
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Figure 44. 


YER044c Protein Sequence. 




Figure 45. 


Mouse EST with Similarity to YER044c. 




Figure 46. 


Human EST with Similarity to YER044c. 




Figure 47. 


Rat EST with Similarity to YER044c. 


5 


Figure 48. 


YLRlOOw DNA Sequence. 




Figure 49 


YLRJOOw Protein Sequence. 




Figure 50 


Human EST with Similarity to YLRJOOw. 




Figure 51 


Mouse EST with Similarity to YLRJOOw. 




Figure 52 


Mouse EST with Similarity to YLRJOOw. 


10 


Figure 53. 


Mouse Gene with Similarity to YLRJOOw. 




Figure 54 


YER034w DNA Sequence. 




Figure 55 


YER034w Protein Sequence. 




Figure 56 


YKL077w DNA Sequence 




Figure 57 


YKL077w Protein Sequence. 


15 


Figure 58 


YGR046w DNA Sequence. 




Figure 59. 


YGR046w Protein Sequence. 




Figure 60. 


YJR04Jc DNA Sequence 




Figure 61. 


YJR041c Protein Sequence. 




Figure 62. 


HES1 DNA Sequence. 


20 


Figure 63 


HESJ Protein Sequence. 




Figure 64. 


Reproducibility of the Genome Reporter Matrix™. 



Fluorescence from 864 independent untreated reporter-harboring yeast strains was 
plotted against the corresponding clones of an independent control array. 
Figure 65. Rat Gene with Similarity to YLRlOOw. 
25 Figure 66. DAKJ DNA Sequence 

Figure 67. DAKJ Protein Sequence. 
Figure 68. PGUJ DNA Sequence. 
Figure 69. PGUJ Protein Sequence. 
Figure 70. STEJ8 DNA Sequence. 
30 Figure 71. STEI8 Protein Sequence. 
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Figure 72. YGLJ98w DNA Sequence. 
Figure 73. YGL198w Protein Sequence. 

Figure 74. Each dot on the 4-quadrant plot represents a treatment 
affecting the reporters affecting DAK J and PGUJ. Treatments are plotted as to 
5 whether DAK I was up-regulated (above x-axis) or down-regulated (below x-axis) and 

whether PGUJ was up-regulated (right of the y-axis) or down-regulated (left of the y- 
axis). Thus, conditions where both reporters are up-regulated are in the upper right 
quadrant. Each division on the graph represents one natural log ratio change relative 
to controls. The hogj knock-out profile is indicated at the lower right. Thus, 
10 simultaneously measuring induction of PGUJ above 2 natural log ratios and repression 

of DAK 1 below one natural ratio specifically indicates Hoglp pathway inactivation. 

Figure 75. The plot description is the same as for Figure 74. The 
subset of treatments that target mitochondrial function form a distinct group in the 
upper right quadrant (within rectangle). Thus, simultaneously measuring induction of 
15 YGL198w and STE18 should specifically indicate perturbations of the mitochondria. 

DETAILED DESCRIPTION OF THE INVENTION 
Definitions and General Techniques 

Unless otherwise defined, all technical and scientific terms used herein 
have the meaning as commonly understood by one of ordinary skill in the art to which 

20 this invention belongs. The practice of the present invention employs, unless otherwise 

indicated, conventional techniques of chemistry, molecular biology, microbiology, 
recombinant DNA, genetics and immunology. See, e.g., Maniatis et al., 1982; 
Sambrook et al., 1989; Ausubel et al., 1992; Glover, 1985; Anand, 1992; Guthrie and 
Fink, 1991 (which are incorporated herein by reference). 

25 A "regulon" is a group of genes that are coordinately regulated in 

response to a number of different stimuli, e.g., treatment with chemical compounds or 
mutations. The member genes of a regulon comprise a functional unit by which a cell 
is able to adapt to a changing environment. The regulation of these genes that led to 
their categorization could be at the level of transcription, mRNA stability, splicing, 
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translation or protein stability. The mode of regulation of each member gene of a 
given regulon need not be the same. 

Genes are categorized into separate regulons based upon changes in 
gene expression. In order to efficiently and accurately group genes into functional 
5 groups, it is necessary to observe each gene's expression change. Since many genes 

function in specialized roles, it is necessary to measure global gene expression under as 
diverse a variety of conditions as possible. Therefore, the database of expression 
profiles used in this invention was made from a diverse collection of chemicals and 
mutant strains of yeast. In general, the greater the number of diverse stimuli which 

10 cause the genes of a regulon to exhibit coordinate expression and the higher the 

correlation coefficient, the more confident one will be that the regulon is a robust 
indicator of the pathway or process of interest. 

A "regulon indicator gene" (RIG) is a gene whose expression changes 
when a particular regulon or biochemical pathway or cellular process is activated or 

15 repressed. Although a RIG's expression may correlate with a particular biochemical 

pathway, the RIG does not necessarily have to be a part of the biochemical pathway 
for which it is an indicator. A RIG may comprise the entire gene, the 5' region of the 
gene including the promoter and/or enhancer and all or a part of the coding region, or 
a fragment, conservatively modified variant or homolog thereof which retains the 

20 indicator function of the RIG. A RIG may be coordinately expressed with a particular 

biological pathway, such that when the pathway is activated the RIG is more highly 
expressed and when the pathway is repressed the RIG's expression is repressed as 
well. However, the invention also encompasses RIGs in which there is an inverse 
correlation with a particular pathway. In this case, activation of a pathway would lead 

25 to a repression of RIG expression, while repression of a pathway would lead to 

activation of RIG expression. A RIG may be coordinately expressed with a particular 
biological pathway, such that when the pathway is activated the RIG is more highly 
expressed. However, the invention also encompasses RIGs in which there is an inverse 
correlation with a particular pathway. In this case, activation of a pathway would lead 

30 to a repression of RIG expression. Furthermore, the invention also encompasses RIGs 
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which are not necessarily part of the regulon, pathway or process for which they are 
indicators. In this case, expression of RIGs may be activated or repressed specifically 
in response to perturbations of a regulon, pathway or process even though the RIG 
itself may only be indirectly related or have no apparent relationship in function to the 
5 regulon, pathway or process. 

In a preferred embodiment, a RIG is specific to a particular pathway, 
wherein its expression changes most significantly when a particular pathway is 
activated or repressed Such a highly specific regulon indicator gene cannot always be 
found for a pathway of interest, in such cases, more than one RIG can be identified 

10 that, when their expression patterns are taken together, correlate with specificity to the 

pathway of interest. Thus, in another preferred embodiment, a plurality of RIGs is 
identified wherein the coordinated expression pattern of the plurality of RIGs is 
specific to a particular biological pathway. In this preferred embodiment, expression of 
each member of the plurality of RIGs may independently increase or decrease when the 

15 biological pathway of interest is activated or repressed 

In another preferred embodiment, a RIG is highly sensitive to changes 
in activation or repression of a pathway, such that even a small perturbation in 
regulation of a pathway results in a change in RIG expression. In a further preferred 
embodiment, a RIG has a large dynamic range, and is highly induced or repressed upon 

20 the corresponding perturbation of the pathway to which it is correlated. 

In another preferred embodiment, a RIG does not contain sequences 
that are problematic for maintaining on plasmids when introduced into host cells. Such 
sequences that may be problematic include centromeric sequences or sites that are 
particularly susceptible to recombination. 

25 A "target gene" or "regulon target gene" is a gene whose function is 

desirable to modulate. A target gene may consist of the entire gene, the 5' region 
comprising the promoter and/or enhancer and all or a part of the coding region, or a 
fragment, conservatively modified variant or homolog thereof which retains the 
function of the target gene. In general, a target gene encodes a protein which is a part 

30 of the biological (e.g., metabolic or biochemical) pathway or process whose 
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modulation would result in a desired outcome. In a preferred embodiment, a target 
gene is a control point in such a pathway. In one more preferred embodiment, a target 
gene is a control point that is relatively "upstream" in the metabolic pathway. 
"Upstream" means that the target gene is involved in one of the first steps of the 
5 metabolic pathway or process. In another more preferred embodiment, a target gene 

is a control point that is relatively "downstream" but specific to a biological pathway 
or a branch of that pathway or process. "Downstream" means that the target gene is 
involved in one of the later steps of the pathway or process. 

A "target" or "target protein" is a protein whose expression or activity 

10 is to be modulated. A target may consist of the entire protein, or a fragment, mutein, 

derivative or homolog thereof which retains the function of the target. In general, a 
target is a protein included within a biological pathway wherein it is desired to 
modulate the process which the protein is involved in. In a preferred embodiment, a 
target is a control point in such a biological pathway. In a more preferred 

15 embodiment, a target is a control point that is relatively "upstream" in the biological 

pathway. "Upstream" means that the target is involved in one of the first steps of the 
pathway. In another more preferred embodiment, a target is a control point that is 
relatively "downstream" but specific to a biological pathway or a branch of that 
pathway "Downstream" means that the target is involved in one of the later steps of 

20 the pathway. 

A "target-dependent reporter gene" is a gene whose expression is 
altered in a cell in which the target gene has been altered or inactivated compared to 
the cell which expresses the normal target gene. The expression of the target- 
dependent reporter gene may increase or decrease in a cell harboring an altered or 

25 inactivated target gene, depending upon the identity of the gene. If expression of the 

target-dependent reporter gene increases in the cell harboring the altered or inactivated 
target gene, then a potential inhibitor of the regulon target gene will increase 
expression of the target-dependent reporter gene, and if expression of the target- 
dependent reporter gene decreases in the cell, then a potential inhibitor of the regulon 

30 target gene will decrease expression of the target-dependent reporter gene. 
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By "pathway" is meant any biological, e.g., metabolic or biochemical, 
set of concerted reactions which occur in response to a particular signal or stimulus in 
a cell. The isoprenoid pathway is one example of such a pathway. Other pathways 
include, without limitation, amino acid and protein synthesis, lipid synthesis, protein 
5 and lipid glycosylation, protein modification, DNA synthesis and repair, RNA 

transcription, phospholipid synthesis, nucleotide synthesis, and energy generation and 
storage (e.g., glycolysis, citric acid cycle, oxidative phosphorylation, gluconeogenesis, 
pentose phosphate pathway, fatty acid metabolism, glycogen and disaccharide 
metabolism, amino acid degradation and the urea cycle), signal transduction and 
10 growth control. 

By "process" is meant any biological reaction or set of reactions that 
occurs within a cell or organism that occurs in response to a stimulus or signal, or that 
occurs during growth, homeostasis, development, differentiation or death of the cell or 
organism. 

1 5 An "isolated" protein or polypeptide is one that has been separated 

from naturally associated components that accompany it in its native state. Thus, a 
polypeptide that is chemically synthesized or synthesized in a cellular system different 
from the cell from which it naturally originates will be "isolated" from its naturally 
associated components. A protein may also be rendered substantially free of naturally 

20 associated components by isolation, using protein purification techniques well known 

in the art. 

A monomeric protein is "substantially pure," "substantially 
homogeneous" or "substantially purified" when at least about 60 to 75% of a sample 
exhibits a single polypeptide sequence. A substantially pure protein will typically 

25 comprise about 60 to 90% WAV of a protein sample, more usually about 95%, and 

preferably will be over 99% pure. Protein purity or homogeneity may be indicated by a 
number of means well known in the art, such as polyacrylamide gel electrophoresis of a 
protein sample, followed by visualizing a single polypeptide band upon staining the gel 
with a stain well known in the art. For certain purposes, higher resolution may be 

30 provided by using HPLC or other means well known in the art for purification. 
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A S. cerevisiae protein has "homology" or is "homologous" to a 
protein from another organism if the encoded amino acid sequence of the yeast protein 
has a similar sequence to the encoded amino acid sequence of a protein of a different 
organism. Alternatively, a S. cerevisiae protein may have homology or be homologous 
5 to another S. cerevisiae protein if the two proteins have similar amino acid sequences. 

Although two proteins are said to be "homologous," this does not imply that there is 
necessarily an evolutionary relationship between the proteins. Instead, the term 
"homologous" is defined to mean that the two proteins have similar amino acid 
sequences. In addition, although in many cases proteins with similar amino acid 

1 0 sequences will have similar functions, the term "homologous" does not imply that the 

proteins must be functionally similar to each other. 

When "homologous" is used in reference to proteins or peptides, it is 
recognized that residue positions that are not identical often differ by conservative 
amino acid substitutions. A "conservative amino acid substitution" is one in which an 

15 amino acid residue is substituted by another amino acid residue having a side chain (R 
group) with similar chemical properties (e.g., charge or hydrophobicity). In general, a 
conservative amino acid substitution will not substantially change the functional 
properties of a protein. In cases where two or more amino acid sequences differ from 
each other by conservative substitutions, the percent sequence identity or degree of 

20 homology may be adjusted upwards to correct for the conservative nature of the 

substitution. Means for making this adjustment are well known to those of skill in the 
art (see, e.g., Pearson et al.,1994, and [Henikoff et al., 1992, herein incorporated by 
reference). 

The following six groups each contain amino acids that are conservative 
25 substitutions for one another: 

1 ) Alanine (A), Serine (S), Threonine (T), 

2) Aspartic Acid (D), Glutamic Acid (E); 

3) Asparagine (N), Glutamine (Q); 

4) Arginine (R), Lysine (K); 

30 5 ) Isoleucine (I), Leucine (L), Methionine (M), Valine (V), and 
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6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W). 

Sequence homology for polypeptides, which is also referred to as 

sequence identity, is typically measured using sequence analysis software. See, e.g., 

the Sequence Analysis Software Package of the Genetics Computer Group (GCG), 

5 University of Wisconsin Biotechnology Center, 910 University Avenue, Madison, 

Wisconsin 53705. Protein analysis software matches similar sequences using measure 

of homology assigned to various substitutions, deletions and other modifications, 

including conservative amino acid substitutions. For instance, GCG contains programs 

such as "Gap" and "Bestfit" which can be used with default parameters to determine 

10 sequence homology or sequence identity between closely related polypeptides, such as 

homologous polypeptides from different species of organisms or between a wild type 

protein and a mutein thereof. 

A preferred algorithm when comparing a S. cerevisiae sequence to a 

database containing a large number of sequences from different organisms is the 

15 computer program BLAST, especially blastp or tblastn (Altschul et al., 1997, herein 

incorporated by reference). Preferred parameters for blastp are: 

Expectation value: 10 (default) 

Filter: seg (default) 

Cost to open a gap: 1 1 (default) 

20 Cost to extend a gap: 1 (default 

Max. alignments: 100 (default) 

Word size: 1 1 (default) 

No. of descriptions: 100 (default) 

Substitution Matrix: BLOSUM62 

25 The length of polypeptide sequences compared for homology will 

generally be at least about 16 amino acid residues, usually at least about 20 residues, 
more usually at least about 24 residues, typically at least about 28 residues, and 
preferably more than about 35 residues. When searching a database containing 
sequences from a large number of different organisms using a S. cerevisiae query 

30 sequence, it is preferable to compare amino acid sequences. Comparison of amino acid 

sequences is preferred to comparing nucleotide sequences because S. cerevisiae has 
significantly different codon usage compared to mammalian or plant codon usage. 

Database searching using amino acid sequences can be measured by 
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algorithms other than blastp known in the art. For instance, polypeptide sequences can 
be compared using Fasta, a program in GCG Version 6. 1 . Fasta provides alignments 
and percent sequence identity of the regions of the best overlap between the query and 
search sequences (Pearson, 1990, herein incorporated by reference). For example, 
5 percent sequence identity between amino acid sequences can be determined using Fasta 

with its default parameters (a word size of 2 and the PAM250 scoring matrix), as 
provided in GCG Version 6. 1, herein incorporated by reference. 

The invention envisions two general types of polypeptide "homologs." 
Type 1 homologs are strong homologs. A comparison of two polypeptides that are 

10 Type 1 homologs would result in a blastp score of less than lxl 0" 40 , using the blastp 

algorithm and the parameters listed above. The lower the blastp score, that is, the 
closer it is to zero, the better the match between the polypeptide sequences. For 
instance, yeast lanosterol demethylase, which is a common target of antifungal agents, 
as discussed above, has a Type 1 homolog in humans. The probability score (e.g., 

1 5 blastp score) is dependent upon the size of the database. Comparison of yeast and 

human lanosterol demethylases produces a blastp score of lxlO" 86 

Type 2 homologs are weaker homologs. A comparison of two 
polypeptides that are Type 2 homologs would result in a blastp score of between 1x10" 
40 and 1 xl 0"'°, using the Blast algorithm and the parameters listed above. One having 

20 ordinary skill in the art will recognize that other algorithms can be used to determine 

weak or strong homology. 

The terms "no substantial homology" or "no human (or mammalian, 
vertebrate, amphibian, fish, insect or plant) homolog" refers to a yeast polypeptide 
sequence which exhibits no substantial sequence identity with a polypeptide sequence 

25 from human, non-human mammals, other vertebrates, insects or plants. A comparison 

of two polypeptides which have no substantial homology to one another would result 
in a blastp score of greater than lxlO -10 , using the Blast algorithm and the parameters 
listed above. One having ordinary skill in the art will recognize that other algorithms 
can be used to determine whether two polypeptides demonstrate no substantial 

30 homology to each other. 
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A polypeptide "fragment," "portion" or "segment" refers to a stretch of 
amino acid residues of at least about five to seven contiguous amino acids, often at 
least about seven to nine contiguous amino acids, typically at least about nine to 1 3 
contiguous amino acids and, most preferably, at least about 20 to 30 or more 

5 contiguous amino acids. 

A polypeptide "mutein" refers to a polypeptide whose sequence 
contains substitutions, insertions or deletions of one or more amino acids compared to 
the amino acid sequence of the native or wild type protein. A mutein has at least 50% 
sequence homology to the wild type protein, preferred is 60% sequence homology, 

0 more preferred is 70% sequence homology. Most preferred are muteins having 80%, 

90% or 95% sequence homology to the wild type protein, in which sequence 
homology is measured by any common sequence analysis algorithm, such as Gap or 
Bestfit. 

A "derivative" refers to polypeptides or fragments thereof that are 

5 substantially homologous in primary structural sequence but which include, e.g., in 

vivo or in vitro chemical and biochemical modifications or which incorporate unusual 
amino acids. Such modifications include, for example, acetylation, carboxylation, 
phosphorylation, glycosylation, ubiquitination, labeling, e.g., with radionuclides, and 
various enzymatic modifications, as will be readily appreciated by those well skilled in 

3 the art. A variety of methods for labeling polypeptides and of substituents or labels 

useful for such purposes are well known in the art, and include radioactive isotopes 
such as 125 1, 32 P, 35 S, and 3 H, ligands which bind to labeled antiligands (e.g., 
antibodies), fluorophores, chemiluminescent agents, enzymes, and antiligands which 
can serve as specific binding pair members for a labeled ligand. The choice of label 

5 depends on the sensitivity required, ease of conjugation with the primer, stability 

requirements, and available instrumentation. Methods for labeling polypeptides are 
well known in the art. See Ausubel et al., 1992, hereby incorporated by reference 

The term "fusion protein" refers to polypeptides comprising 
polypeptides or fragments coupled to heterologous amino acid sequences. Fusion 

) proteins are useful because they can be constructed to contain two or more desired 
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functional elements from two or more different proteins. Fusion proteins can be 
produced recombinantly by constructing a nucleic acid sequence which encodes the 
polypeptide or a fragment thereof in frame with a nucleic acid sequence encoding a 
different protein or peptide and then expressing the fusion protein. Alternatively, a 
5 fusion protein can be produced chemically by crosslinking the polypeptide or a 

fragment thereof to another protein. 

An "isolated" or "substantially pure" nucleic acid or polynucleotide 
(e.g., an RNA, DNA or a mixed polymer) is one which is substantially separated from 
other cellular components that naturally accompany the native polynucleotide in its 

10 natural host cell, e.g., ribosomes, polymerases, or genomic sequences with which it is 

naturally associated. The term embraces a nucleic acid or polynucleotide that has been 
removed from its naturally occurring environment. The term "isolated" or 
"substantially pure" also can be used in reference to recombinant or cloned DNA 
isolates, chemically synthesized polynucleotide analogs, or polynucleotide analogs that 

15 are biologically synthesized by heterologous systems. 

The term "percent sequence identity" or "identical" in the context of 
nucleic acid sequences refers to the residues in the two sequences which are the same 
when aligned for maximum correspondence The length of sequence identity 
comparison may be over a stretch of at least about nine nucleotides, usually at least 

20 about 20 nucleotides, more usually at least about 24 nucleotides, typically at least 

about 28 nucleotides, more typically at least about 32 nucleotides, and preferably at 
least about 36 or more nucleotides. There are a number of different algorithms known 
in the art which can be used to measure nucleotide sequence identity. For instance, 
polynucleotide sequences can be compared using Fasta, a program in GCG Version 

25 6. 1 . Fasta provides alignments and percent sequence identity of the regions of the best 

overlap between the query and search sequences (Pearson, 1990, herein incorporated 
by reference). For instance, percent sequence identity between nucleic acid sequences 
can be determined using Fasta with its default parameters (a word size of 6 and the 
NOPAMfactor for the scoring matrix) as provided in GCG Version 6. 1, herein 

30 incorporated by reference. 
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The term "substantial homology" or "substantial similarity," when 
referring to a nucleic acid or fragment thereof, indicates that, when optimally aligned 
with appropriate nucleotide insertions or deletions with another nucleic acid (or its 
complementary strand), there is nucleotide sequence identity in at least about 60% of 
5 the nucleotide bases, usually at least about 70%, more usually at least about 80%, 

preferably at least about 90%, and more preferably at least about 95-98% of the 
nucleotide bases, as measured by any well-known algorithm of sequence identity, such 
as Fasta, as discussed above. 

Alternatively, substantial homology or similarity exists when a nucleic 

10 acid or fragment thereof hybridizes to another nucleic acid, to. a strand of another 

nucleic acid, or to the complementary strand thereof, under selective hybridization 
conditions. Typically, selective hybridization will occur when there is at least about 
55% sequence identity ~ preferably at least about 65%, more preferably at least about 
75%, and most preferably at least about 90% — over a stretch of at least about 14 

15 nucleotides. See, e.g., Kanehisa, 1984, herein incorporated by reference. 

Nucleic acid hybridization will be affected by such conditions as salt 
concentration, temperature, solvents, the base composition of the hybridizing species, 
length of the complementary regions, and the number of nucleotide base mismatches 
between the hybridizing nucleic acids, as will be readily appreciated by those skilled in 

20 the art. "Stringent hybridization conditions" and "stringent wash conditions" in the 
context of nucleic acid hybridization experiments depend upon a number of different 
physical parameters. The most important parameters include temperature of 
hybridization, base composition of the nucleic acids, salt concentration and length of 
the nucleic acid. One having ordinary skill in the art knows how to vary these 

25 parameters to achieve a particular stringency of hybridization. In general, "stringent 

hybridization" is performed at about 25 °C below the thermal melting point (T m ) for the 
specific DNA hybrid under a particular set of conditions. "Stringent washing" is 
performed at temperatures about 5°C lower than the T ra for the specific DNA hybrid 
under a particular set of conditions. The T m is the temperature at which 50% of the 

30 target sequence hybridizes to a perfectly matched probe. See Sambrook et al, page 
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9.51, hereby incorporated by reference. 

The T m for a particular DNA-DNA hybrid can be estimated by the 

formula: 

T m = 81.5°C + 16 6 (log 10 [Na + ]) + 0.41 (fraction G + C) - 0.63 (% 
formamide) - (600/1) where 1 is the length of the hybrid in base pairs. 

The T m for a particular RNA-RNA hybrid can be estimated by the 

formula: 

T m = 79.8 °C + 18.5 (log 10 [Na + ]) + 0.58 (fraction G + C) + 1 1.8 
(fraction G + C) 2 - 0.35 (% formamide) - (820/1). 

The T m for a particular RNA-DNA hybrid can .be estimated by the 

formula: 

T m = 79.8°C + 18.5(log 10 [Na + ]) + 0.58 (fraction G + C) + 1 1.8 
(fraction G + C) 2 - 0.50 (% formamide) - (820/1). 

In general, the T m decreases by 1-1. 5°C for each 1% of mismatch 
between two nucleic acid sequences. Thus, one having ordinary skill in the art can 
alter hybridization and/or washing conditions to obtain sequences that have higher or 
lower degrees of sequence identity to the target nucleic acid. For instance, to obtain 
hybridizing nucleic acids that contain up to 10% mismatch from the target nucleic acid 
sequence, 10-15°C would be subtracted from the calculated T m of a perfectly matched 
hybrid, and then the hybridization and washing temperatures adjusted accordingly. 
Probe sequences may also hybridize specifically to duplex DNA under certain 
conditions to form triplex or other higher order DNA complexes. The preparation of 
such probes and suitable hybridization conditions are well known in the art. 

An example of stringent hybridization conditions for hybridization of 
complementary nucleic acid sequences having more than 100 complementary residues 
on a filter in a Southern or Northern blot or for screening a library is 50% 
formamide/6X SSC at 42°C for at least ten hours. Another example of stringent 
hybridization conditions is 6X SSC at 68 °C for at least ten hours. An example of low 
stringency hybridization conditions for hybridization of complementary nucleic acid 
sequences having more than 100 complementary residues on a filter in a Southern or 
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northern blot or for screening a library is 6X SSC at 42 °C for at least ten hours. 
Hybridization conditions to identify nucleic acid sequences that are similar but not 
identical can be identified by experimentally changing the hybridization temperature 
from 68°C to 42°C while keeping the salt concentration constant (6X SSC), or 
5 keeping the hybridization temperature and salt concentration constant (e.g. 42°C and 

6X SSC) and varying the formamide concentration from 50% to 0%. Hybridization 
buffers may also include blocking agents to lower background. These agents are well- 
known in the art. See Sambrook et al., pages 8.46 and 9.46-9.58, herein incorporated 
by reference. 

10 Wash conditions also can be altered to change stringency conditions. 

An example of stringent wash conditions is a 0.2x SSC wash at 65°C for 15 minutes 
(see Sambrook et al., for SSC buffer). Often the high stringency wash is preceded by a 
low stringency wash to remove excess probe. An exemplary medium stringency wash 
for duplex DNA of more than 100 base pairs is lx SSC at 45 °C for 15 minutes. An 

15 exemplary low stringency wash for such a duplex is 4x SSC at 40°C for 15 minutes. 
In general, signal-to-noise ratio of 2x or higher than that observed for an unrelated 
probe in the particular hybridization assay indicates detection of a specific 
hybridization 

As defined herein, nucleic acids that do not hybridize to each other 
20 under stringent conditions are still substantially homologous to one another if they 

encode polypeptides that are substantially identical to each other. This occurs, for 
example, when a nucleic acid is created synthetically or recombinantly using a high 
codon degeneracy as permitted by the redundancy of the genetic code. 

The polynucleotides of this invention may include both sense and 
25 antisense strands of RNA, cDNA, genomic DNA and synthetic forms and mixed 

polymers of the above. They may be modified chemically or biochemically or may 
contain non-natural or derivatized nucleotide bases, as will be readily appreciated by 
those of skill in the art. Such modifications include, for example, labels, methylation, 
substitution of one or more of the naturally occurring nucleotides with an analog, 
30 internucleotide modifications such as uncharged linkages (e.g., methyl phosphonates, 
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phosphotriesters, phosphoramidates, carbamates, etc.), charged linkages (e.g., 
phosphorothioates, phosphorodithioates, etc.), pendent moieties (e.g., polypeptides), 
intercalators (e.g., acridine, psoralen, etc.), chelators, alkylators, and modified linkages 
(e.g., alpha anomeric nucleic acids, etc.) Also included are synthetic molecules that 
5 mimic polynucleotides in their ability to bind to a designated sequence via hydrogen 

bonding and other chemical interactions. Such molecules are known in the art and 
include, for example, those in which peptide linkages substitute for phosphate linkages 
in the backbone of the molecule. 

"Conservatively modified variations" or "conservatively modified 

10 variants" of a particular nucleic acid sequence refers to nucleic acids that encode 

identical or essentially identical amino acid sequences or DNA sequences where no 
amino acid sequence is encoded. Due to the degeneracy of the genetic code, a large 
number of functionally identical nucleic acids encode any given polypeptide sequence. 
When a nucleic acid sequence is changed at one or more positions with no 

1 5 corresponding change in the amino acid sequence which it encodes, that mutation is 

called a "silent mutation." Thus, one species of a conservatively modified variation 
according to this invention is a silent mutation. Accordingly, every nucleic acid 
sequence herein which encodes a polypeptide also describes every possible silent 
mutation or variation. 

20 Furthermore, one of skill in the art will recognize that individual 

substitutions, deletions, additions and the like, which alter, add or delete a single amino 
acid or a small percentage of amino acids (less than 5%, more typically less than 1%) 
in an encoded sequence are "conservatively modified variations" or "conservatively 
modified variants" where the alterations result in the substitution of one amino acid 

25 with a chemically similar amino acid. Conservative substitution tables providing 

functionally similar amino acids are well known in the art. 

The term "antibody" refers to a polypeptide encoded by an 
immunoglobulin gene, genes, or fragments thereof. The immunoglobulin genes include 
the kappa, lambda, alpha, gamma, delta, epsilon and mu constant regions, as well as a 

30 myriad of immunoglobulin variable regions. Light chains are classified as either kappa 
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or lambda. Heavy chains are classified as gamma, mu, alpha, delta, or epsilon, which 
in turn define the immunoglobulin classes IgG, IgM, IgA, IgD and IgE, respectively. 

Antibodies exist for example, as intact immunoglobulins or as a number 
of well-characterized fragments produced by digestion with various peptidases. For 
5 example, trypsin digests an antibody below the disulfide linkages in the hinge region to 

produce F(ab)' 2 , a dimer of Fab which itself is a light chain joined to a V H -C H 1 by a 
disulfide bond. The F(ab)' 2 may be reduced under mild conditions to break the 
disulfide linkage in the hinge region thereby converting the F(ab)' 2 dimer to a Fab' 
monomer. The Fab' monomer is essentially an Fab with part of the hinge region. See 

1 0 Paul ( 1 993) (incorporated herein by reference), for a detailed description of epitopes, 
antibodies and antibody fragments. One of skill in the art recognizes that such Fab' 
fragments may be synthesized de novo either chemically or using recombinant DNA 
technology Thus, as used herein, the term antibody includes antibody fragments 
produced by the modification of whole antibodies or those synthesized de novo. The 

1 5 term antibody also includes single-chain antibodies, which generally consist of the 

variable domain of a heavy chain linked to the variable domain of a light chain. The 
production of single-chain antibodies is well known in the art (see, e.g., U.S. Pat. No. 
5,359,046). The antibodies of the present invention are optionally derived from 
libraries of recombinant antibodies in phage or similar vectors (see, e.g., Huse et al. 

20 (1989), Ward et al. (1989); Vaughan et al. (1996) which are incorporated herein by 

reference). 

As used herein, "epitope" refers to an antigenic determinant of a 
polypeptide, i.e., a region of a polypeptide that provokes an immunological response in 
a host. This region need not comprise consecutive amino acids. The term epitope is 
25 also known in the art as "antigenic determinant." An epitope may comprise as few as 

three amino acids in a spatial conformation which is unique to the immune system of 
the host. Generally, an epitope consists of at least five such amino acids, and more 
usually consists of at least 8-10 such amino acids. Methods for determining the spatial 
conformation of such amino acids are known in the art. 
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Methods for Analyzing ORF Gene Expression 

The cell's ability to monitor its own biochemical ecology may be 
considered as a fully integrated multi-dimensional set of specific biochemical assays. 
The data from each individual assay manifests itself either directly or indirectly in the 
5 change in expression of a single gene or small set of genes. The individual components 

of the assaying capabilities of the cell may be extracted by measuring the changes in 
global gene expression in response to a controlled experimental challenge. 

The measurement of global gene expression may be done by a number 
of different methods. One technique is that of hybridization to nucleic acid arrays on 
10 solid surfaces, such as "gene chips" (Fodor et al., 1991). Another method uses a 

reporter construct in the GRM or an equivalent matrix comprising living cells, 
preferably eukaryotic cells, and more preferably yeast, insect, plant, avian, fish or 
mammalian cultured cells. Other methods include SAGE. 



DNA Chip Technology 

One method for determining comprehensive gene expression profiles is 
DNA gene chip technology (see, e.g., Fodor et al., 1991). A DNA gene chip can be 
made comprising a large number of immobilized single-stranded nucleic acids, each of 
which hybridizes specifically to a gene or its mRNA, representing a particular genome 
or a significant subset thereof. Messenger RNA molecules extracted from a cell or 
cDNA molecules converted from such mRNA molecules can be labeled. The labeling 
can be accomplished, for example, radioisotopically or fluorescently by methods well 
known in the art. These mRNA or cDNA molecules are rendered single-stranded and 
then allowed to hybridize to the immobilized single-stranded nucleic acids on the gene 
chip. A computer equipped with a scanner then determines the extent of hybridization, 
thereby quantitating the amount of mRNA produced for any given gene or genetic 
sequence. 

Profiles of gene expression generated under different conditions or in 
response to different stimuli such as treatment with chemical compounds are produced 
by treating cells with a compound, isolating the mRNA the cells, optionally producing 
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cDNA and then hybridizing the single-stranded nucleic acids on the gene chip as 
discussed above. Preferably, software is used to correlate the expression of each gene 
on the hybridization chip relative to other genes under different conditions or in 
response to different treatments (see below). 

Promoter elements from genes of interest that respond to an input 
signal can then be isolated and operatively linked to a reporter gene described above by 
recombinant DNA techniques well known in the art for further characterization. 

Genome Reporter Matrix™ Technology 

An alternative method to DNA gene chip technology is the use of a 
Genome Reporter Matrix™ (GRM), or an equivalent thereof. The description below 
of the generation of gene expression profiles utilizing the Genome Reporter Matrix™ 
has been described essentially in United States Patents 5,569,888 and 5,777,888, both 
of which are incorporated herein by reference. 

The promoter (and optionally, 5' upstream regulatory elements and/or 5' 
upstream untranslated sequences) of an ORF or a gene from a cellular genome 
(preferably a eukaryotic genome) is fused to a reporter gene creating a transcriptional 
and/or translational fusion of the promoter to the reporter gene. In a preferred 
embodiment, the genome is that of S. cerevisiae. The promoter and optional 
additional sequences comprise all the regulatory elements necessary for transcriptional 
(and optionally translational) control of an attached coding sequence. The reporter 
gene can be any gene that, when expressed in a suitable host, encodes a product that 
can be detected by a quantitative assay. Any suitable assay may be used, including but 
not limited to enzymatic, colorimetric, fluorescence or other spectrographic assays, 
fluorescent activated cell sorting assay and immunological assays. Examples of 
suitable reporter genes include, inter alia , green fluorescent protein (GFP), 0- 
lactamase, lacZ, invertase, membrane bound proteins (e.g., CD2, CD4, CD 8, the 
influenza hemagglutinin protein, and others well known in the art) to which high 
affinity antibodies directed to them exist or can be made routinely, fusion protein 
comprising membrane bound protein appropriately fused to an antigen tag domain 
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(e.g., hemagglutinin or Myc and others well known in the art). In a preferred 
embodiment, the reporter protein is GFP from the jellyfish Aequorea victoria. GFP is 
a naturally fluorescing protein that does not require the addition of any exogenous 
substrates for activity. The ability to measure GFP fluorescence in intact living cells 
5 makes it an ideal reporter protein for the GRM or an equivalent matrix comprising 

living cells. 

In a preferred embodiment, reporter constructs comprise the 5' region 
of the ORF comprising the promoter of the ORF and other expression regulatory 
sequences, and generally the first four codons of the ORF fused in-frame to the green 

10 fluorescent protein. In a more preferred embodiment, approximately 1200 base-pairs 

of 5 1 regulatory sequence are included in each fusion Only 228 yeast ORFs (3.5%) 
possess introns. Of these 228 intron-containing ORFs, all but four contain only one 
intron. In these ORFs, fusions are created two to four codons past (3 1 to) the splice 
junction. Therefore, these fusions must undergo splicing in order to create a functional 

15 reporter fusion. 

Each reporter is assembled in an episomal yeast shuttle vector (either 
CEN or 2u plasmid) or on a yeast integrating vector for subsequent insertion into the 
chromosomal DNA. In a preferred embodiment, the gene reporter constructs are built 
using a yeast multicopy vector. A multicopy vector is chosen to facilitate easy transfer 

20 of the reporter constructs to many different yeast strain backgrounds. In addition, the 

vector replicates at an average of 10 to 20 copies per cell, providing added sensitivity 
for detecting genes that are expressed at a low level. In principle, introducing 
additional copies of a gene's regulatory region could, through titration of regulatory 
proteins, disrupt a response of interest. However, in practice this appears not to occur, 

25 and efforts to successfully exploit such titration effects have required much higher copy 

number vectors and have been largely unsuccessful. In another preferred embodiment, 
the reporter constructs are maintained on episomal plasmids in yeast. 

In one embodiment, a plurality (all or a significant subset) of the 
resulting approximately 6,000 reporter constructs is transformed into a strain of yeast 

30 The resulting strains constitute one embodiment of the Genome Reporter Matrix™. 
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See Example 1 . 

Profiles are produced by arraying wild type or mutant cells carrying the 
reporter fusion genes in growth media containing different drugs and chemical 
compounds and measuring changes in expression of the reporter gene by the 
5 appropriate assay (see below). In a preferred embodiment, where the reporter gene is 

GFP, measurement of changes in expression are done by measuring the amount of 
green light produced by the cells over time with an automated fluorescence scanner. 
Alternatively, the drugs or chemical compounds may be added to the yeast cells after 
they have been arrayed onto growth media and then measuring changes in reporter 

10 gene expression by the appropriate assay. 

Over 93% of the reporters are detectable over background on rich 
medium The reproducibility of individual reporters is high, with expression generally 
varying by less than 10%. In contrast, hybridization experiments have proven 
unreliable for effects of less than a factor of two. Figure 64 depicts expression data of 

1 5 the GRM from two independent experiments plotted against each other. 

In a preferred embodiment, the GRM is used to obtain gene expression 
information from a genome. The GRM is preferred to hybridization-based methods of 
profiling for several reasons. First, because the promoter-reporter fusions include the 
first four amino acids of the native gene product, the response profiles are composites 

20 of both transcriptional and translational effects. The importance of being able to 

monitor both levels of response is underscored by the experience with bacterial 
antibiotics. Those antibiotics that work at the translational level have a greater 
therapeutic performance than those affecting transcription. Because hybridization- 
based methods can reveal only effects on transcription, profiling with the GRM 

25 provides a more complete view of the full spectrum of biological effects induced by 

exposure to drugs or compounds. 

Second, the GRM permits profiling of gene expression changes in living 
cells, which permits one to easily measure the kinetics of changes in gene response 
profiles in the same population of cells following exposure to different drugs and 
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chemical agents. Thus, by collecting multiple data sets over time, one can identify the 
genes that make up primary and secondary responses. 

Third, hybridization-based methods require relatively sophisticated 
molecular procedures to produce labeled cDNA, followed by a 14 hour hybridization 
5 of labeled cDNA probes to target DNA arrays on slides or chips. The GRM requires 

only that being able to produce arrays of colonies and measure emitted light. These 
procedures are easier to scale up in an industrial setting than are sophisticated 
molecular biology methods, rendering data that is more straightforward to produce and 
more reproducible in nature. 

1 0 Gene Expression Profiles 

Using the reporter construct, gene chip technology or another method 
for obtaining genome-wide gene expression, the gene expression profile of yeast genes 
can be obtained. In a preferred embodiment, either the GRM or gene chip technology 
is used. In a more preferred embodiment, the GRM is treated with a number of 

1 5 pharmaceutical compounds and the resulting expression of the reporter constructs is 

analyzed. Generally, for each pharmaceutical compound, the expression of the 
reporter constructs are analyzed in the presence of the vehicle for the pharmaceutical 
compound alone and is compared to the expression of the reporter constructs in the 
presence of the pharmaceutical compound. Changes in expression of the reporter 

20 constructs in the absence and presence of the pharmaceutical compound is obtained 

either by subtracting the baseline level of expression from the level after treatment or 
dividing the baseline level of expression from the level after treatment. By looking at a 
large number of reporter constructs, one can assign yeast ORFs to functional groups 
based upon their expression patterns in response to various pharmaceutical 

25 compounds. These functional groups may provide valuable information as to the 

function of the yeast proteins as well as their human, non-human mammalian, avian, 
fish, insect and plant counterparts. 

Preferably, software is used to correlate the expression of each gene in 
the GRM or on the DNA chip relative to other genes under different conditions and in 
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response to different pharmaceutical compounds. In one preferred embodiment, the 
software is capable of producing a correlation coefficient for each gene's expression 
relative to every other gene across all expression profiles in a database. Such analysis 
reveals groups of genes that exhibit coordinate regulation (regulons). See, e.g., U.S. 
5 Serial No. 09/076,668, now pending; Eisen et al. (1998); and Tamayo et al. (1999). 

In a preferred embodiment, a gene of unknown function may be placed 
into a functional genetic group by the following steps: 

a) generating a gene expression profile for Gene X, a gene of 
unknown function; 

10 b) comparing the gene expression profile of Gene X with 

expression profiles of a plurality of other genes in a database of 
compiled gene expression profiles to generate expression 
correlation coefficients; 

c) identifying based on their expression correlation coefficients a 

1 5 set of genes comprising Gene X that are coordinately expressed; 

d) determining if the genes whose expression is most highly 
correlated with that of Gene X belong to a gene regulon 
involved in a known biological pathway, or a common set of 
biological reactions or functions; and 

20 e) optionally testing the effect on Gene X expression of at least 

one altered condition or treatment known to affect the function 
to which Gene X hs been ascribed. 
If Gene X expression is coordinate with expression of the regulon, then Gene X is 
placed in the regulon. 

25 Methods to Identify Potential RIGs 

A GRM (or an equivalent) is chemically treated with a large number of 
compounds. Regulons are identified as groups of genes that are coordinately regulated 
in response to genetic mutations, treatment with compounds or different environmental 
conditions. In a preferred embodiment, regulons are identified using correlation 
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coefficients assembled by software that does clustering analysis, such as that described 
in U.S. Serial No. 09/076,668, now pending; Eisen et al. (1998); and Tamayo ct al. 
(1999). In a preferred embodiment, genes that constitute a regulon have a correlation 
coefficient of greater than 0.5. In a more preferred embodiment, genes that constitute 
5 a regulon have a correlation coefficient of at least 0.6 or 0.7. In a further preferred 

embodiment, genes that constitute a regulon have a correlation coefficient of at least 
0.8 or 0.9. The correlation coefficient may be measured by any method of obtaining 
correlation coefficients, including, without limitation, the method described in United 
States Patent Application Serial No. 09/076,668, now pending or in Eisen et al. 
10 (1998). 

Once a group of genes has been grouped into a regulon, one can 
identity potential regulon indicator genes (RIGs), which may or may not be a member 
of the regulon, pathway or process with the regulon, pathway, or process for which 
they are an indicator. RIGs may be either characterized or uncharacterized genes 

15 provided they have certain characteristics. Preferred characteristic include one or more 

of the following: 1) its expression profile is sensitive to one or more stimuli; 2) its 
expression profile exhibits a large dynamic range in response to one or more stimuli; 3) 
its expression profile exhibits a rapid kinetic response to one or more stimuli; 4) its 
expression profile is specific to a known biological pathway or a common set of 

20 biological reactions or functions; 5) the regulon indicator gene does not contain 

sequences that are problematic for maintaining on plasmids when introduced into host 
cells. Most preferably, their expression is relatively specific for a particular 
biochemical pathway or cellular condition, highly sensitive to small changes in 
activation of a biochemical pathway or cellular condition and exhibit a wide dynamic 

25 range of expression so that the RIG is easier to assay. 

A "large dynamic range" is one in which the response in gene 
expression in response to a stimulus is at least four-fold over basal levels of expression 
in the absence of the stimulus. A response may be either an increase or a decrease in 
gene expression. In a preferred embodiment, the response is at least ten-fold over 

30 basal levels. In a more preferred embodiment, the response is at least twenty-fold over 
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basal levels. In an even more preferred embodiment, the response is at least 100-fold 
over basal levels. 

A "rapid kinetic response" is one in which the response occurs in the 
same time period as the doubling time of the organism after stimulation with the 
5 stimulus. In a preferred embodiment, the response occurs less than 10 minutes. In a 

more preferred embodiment, the response occurs in less than one minute. 

A "stimulus" or "stimuli" is a chemical compound, a genetic mutation, 
or a change in the environment of the cell, including, without limitation, a change in 
pH, temperature, osmotic pressure, salinity, light, gas concentration or partial pressure 

10 (e.g. 0 2 , C0 2 , CO or NO). 

In order to determine whether a potential RIG is specific for a particular 
biochemical pathway or cellular condition, expression of the potential RIG is examined 
under all conditions in the expression database. A desirable RIG is one whose 
expression is selectively induced or repressed by chemicals or mutations that are 

15 known to affect the process in question. Likewise, a desirable RIG's expression is not 

influenced by chemicals or mutations that are known not to affect the process in 
question. This analysis provides information regarding whether the RIG participates in 
additional cellular processes or biochemical pathways. When a potential RIG is not a 
member of a target regulon, pathway or process, specificity is measured by analyzing 

20 expression under all conditions under which the potential RIG is activated or repressed 

to determine if similar conditions elicit similar responses. 

Most preferably, a single RIG may be identified to be highly specific to 
a particular pathway, i.e., wherein its expression changes only when a particular 
pathway is activated or repressed, but not when other pathways are likewise regulated. 

*5 Such a highly specific regulon indicator gene cannot always be found for a pathway of 

interest. In such cases, however, more than one RIG may be identified whose 
coordinate expression patterns correlate with high specificity to a pathway of interest. 
Preferably, the coordinate expression of two RIGs provides such specificity. However, 
the present invention is not limited by the number of RIGs identified and used 

!0 simultaneously as regulated pathway indicators. Expression of each member of a 
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plurality of RIGs may independently increase or decrease when the biological pathway 
of interest is activated or repressed. 

In order to determine whether a potential RIG is highly indicative of 
activation of a particular pathway, the gene will be activated or repressed to an 
5 expression level at least 2-fold higher or lower (if the gene is repressed) than when the 

pathway is not activated. In a preferred embodiment, the gene is activated or 
repressed to an expression level at least 10-fold higher or lower than the unactivated 
pathway. In a more preferred embodiment, the gene is activated or repressed to an 
expression level at least 20-fold higher or lower than the unactivated pathway. The 

10 expression level may be represented as a natural log ratio of treated/untreated 

expression values. See Figure 37, for example. In a preferred embodiment, the natural 
log ratio of a RIG is greater than 1, more preferably greater than 2.5, and even more 
preferably greater than 4.0 when the pathway or process is activated 

In order to determine the dynamic range of a potential RIG, the 

1 5 expression of the RIG is assessed by examining its expression in response to all the 
treatments and mutations in the database. In a preferred embodiment, there is a high 
level of change in RIG expression for small changes in activation of the pathway. 

In one embodiment of the invention, expression of a regulon indicator 
gene correlates with the expression of at least one known gene in a group of 

20 coordinately expressed genes or provide a measure of the function of a biological 

process of interest. The RIG is identified by a method comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 

b) identifying based on their relative expression correlation 
25 coefficients a set of genes that are coordinately expressed; 

c) selecting a set of genes from b) which comprises one or more 
genes known to function in a particular biological pathway, or a 
common set of biological reactions or functions; 

d) selecting a member of the set of c) having one or more of the 
30 following characteristics: 
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1 ) its expression profile is sensitive to one or more stimuli; 

2) its expression profile exhibits a large dynamic range in 
response to one or more stimuli; 

3) its expression profile exhibits a rapid kinetic response to 
5 one or more stimuli, 

4) its expression profile is specific to a known biological 
pathway or a common set of biological reactions or 
functions; 

5) the regulon indicator gene does not contain sequences 
1 0 that are problematic for maintaining on plasmids when 

introduced into host cells. 
The RIG may also be co-regulated with one or more genes in the group 
of coordinately expressed genes of c) above. In addition, the RIG may control the 
expression of at least one other gene in the group of coordinately expressed genes of c) 
1 5 above. The RIG may be a gene of previously unknown function. 

In another embodiment, the invention provides a method for identifying 
a regulon indicator gene in a database of compiled gene expression profiles, wherein 
expression of the regulon indicator gene provides a measure of the function of a 
biological pathway or process of interest. The method comprises the steps of: 
20 a) examining exemplary expression profiles in response to one or 

more chemical or genetic treatments which target the pathway or process of interest to 
generate reporter sensitivity data; 

b) selecting a set of genes from a) which comprises one or more 
genes most significantly affected in response to the treatment or treatments; and 
25 c) selecting at least one gene from b) whose expression profile is 

maximized for its specificity and sensitivity to the treatment or class of treatments in a) 
compared to its sensitivity to all other treatments in the database. 

The regulon indicator gene may be co-regulated with one or more 
genes in the set of genes of a) or the regulon indicator gene, upon expression, controls 
30 the expression of at least one other gene in the in the set of genes of a). 
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Methods to Identify Potential Target Genes and Targets 

A regulon is identified as described above under "Methods to Identify 
Potential RIGs." In a preferred embodiment, a regulon will contain both characterized 
and uncharacterized genes. In many cases, the characterized genes will have a 
5 common function or will be part of the same biochemical pathway. For instance, a 

regulon of the isoprenoid pathway will contain characterized genes involved in sterol 
biosynthesis. Uncharacterized genes will then be analyzed in terms of whether they are 
likely to be part of the same biochemical pathway as the characterized genes. The 
sequence of uncharacterized genes will be compared to the sequence of genes of 

10 known function to determine if the uncharacterized genes or their gene products have 

any motif's common to characterized genes. 

For instance, uncharacterized genes will be examined for domains 
indicating enzymatic functions, including, without limitation, kinase, protease and 
phosphorylase activities. Similarly, uncharacterized genes will be examined for 

1 5 domains indicating that they might be transcription factors, including, without 

limitation, zinc finger, PHD, steroid-binding and helix-loop-helix regions. Other 
domains of interest include lipid-binding and ATP-binding domains. Uncharacterized 
genes will also be examined for sequence similarities to secreted factors and receptors. 
In a preferred embodiment, target genes and their encoded target proteins are 

20 previously uncharacterized, highly correlated with a particular regulon containing 

genes for a specific pathway or process, and that appear to be an enzyme, secreted 
factor, receptor or transcription factor. 

In a preferred embodiment, a novel regulon target gene may be selected 
from a database of compiled gene expression profiles. The target gene is selected 

25 comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 

b) identifying based on their expression correlation coefficients a 
set of genes that are coordinately expressed; 
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c) selecting from b) a set of genes comprising one or more genes 
of unknown function and one or more genes known to function 
in a particular biological pathway, or a common set of biological 
reactions or functions of interest; 
5 d) selecting from the set of c) at least one gene of unknown 

function, Gene X, as a novel regulon target gene; wherein Gene 
X is a gene whose expression profile closely correlates to the 
expression profiles of the one or more genes of the set of c) 
known to function in the particular biological pathway, or 
10 common set of biological reactions or functions of interest. 

The method may further comprise the step of generating individual 
correlation coefficients between the gene expression profile of Gene X and a plurality 
of genes in the database to assess the selectivity of Gene X as a novel regulon target 
gene. The method may further comprise the step of determining whether the protein 
1 5 encoded by Gene X exhibits substantial homology to a human, non-human mammal, 

avian, amphibian, fish, insect or plant protein, including, without limitation, the step of 
hybridizing Gene X to genomic DNA from human, non-human mammal, avian, 
amphibian, fish, insect or plant cells or tissue under low stringency conditions, 
comparing the DNA sequence of Gene X to the DNA sequences from other organisms, 
20 or obtaining an amino acid sequence encoded by Gene X and comparing it to amino 

acid sequences from other organisms. The DNA or amino acid sequences from other 
organisms may be contained within a database and the DNA or amino acid sequence 
encoded by Gene X may compared to the DNA or amino acid sequences from other 
organisms using a computer algorithm such as blastp, tblastn or another algorithm that 
25 utilizes string alignments. The method for identifying a target may further comprise 

the steps of: 

a) disrupting the function of Gene X or its homolog in a yeast cell; and 

b) identifying whether the function of Gene X is essential for yeast 
germination, vegetative growth, pseudohyphal or hyphal growth. 
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In another embodiment of the invention, genes that are regulated by 
regulon target genes of yeast or its mammalian homolog may be identified. The 
method comprises the steps of 

a) overexpressing the target gene in host cells of a matrix comprising a 
plurality of units of cells, the cells in each unit containing a reporter 
gene operably linked to an expression control sequence derived from a 
gene of a selected organism; and 

b) identifying genes that are either induced or repressed by overexpression 
of the target gene. 

In a preferred embodiment, the target gene is selected from the group 
consisting of YMR134w, YER034w, YJL105w, YKL077w, YGR046w, YJR04Jc, 
YER044c and YLRlOOw and their mammalian homologs. 

Methods for Constructing Mutant Yeast Strains 

Once a potential target has been identified, one may disrupt the gene to 
determine the effect of inhibiting the gene's activity has on the phenotype of the yeast 
cell There are a number of methods well known in the art by which a person can 
disrupt a particular gene in yeast. One of skill in the art can disrupt an entire gene and 
create a null allele, in which no portion of the gene is expressed. One may also 
produce and express an allele comprising a portion of the gene which is not sufficient 
for gene function. This may be done by inserting a nonsense codon into the sequence 
of the gene such that translation of the mutant mRNA transcript ends prematurely. 
One may also produce and express alleles containing point mutations, individually or in 
combination, that reduce or abolish gene function. 

There are a number of different strategies for creating conditional 
alleles of genes. Broadly, an allele can be conditional for function or expression. An 
example of an allele that is conditional for function is a temperature sensitive mutation 
where the gene product is functional at one temperature but non-functional at another, 
e.g., due to misfolding or mislocalization. One of ordinary skill in the art may produce 
mutant alleles which may have only one or a few altered nucleotides but which encode 
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inactive or temperature-sensitive proteins. Temperature-sensitive mutant yeast strains 
express a functional protein at permissive temperatures but do not express a functional 
protein at non-permissive temperatures. 

An example of an allele that is conditional for expression is a chimeric 
gene where a regulated promoter controls the expression of the gene. Under one 
condition the gene is expressed and under another it is not. One may replace or alter 
the endogenous promoter of the gene with a heterologous or altered promoter that can 
be activated only under certain conditions. These conditional mutants only express the 
gene under defined experimental conditions. In a preferred embodiment, the gene is 
under the control of a regulated promoter where the gene may be expressed at higher 
or lower levels depending upon the degree of activation of the promoter. For instance, 
a gene under the control of a regulated promoter may be expressed at any level 
between 0 and 100% of wild type expression, such as at 10%, 20%, 50% or 80% of its 
wild type level. The gene may also be expressed at levels above its usual wild type 
expression (overexpression). All of these methods are well known in the art. For 
example, see Stark (1998), Garfinkel et al, (1998), and Lawrence and Rothstein, 
(1991 ), herein incorporated by reference. 

One having ordinary skill in the art also may decrease expression of a 
gene without disrupting or mutating the gene. For instance, one may decrease the 
expression of a gene by transforming yeast with an antisense molecule or ribozyme 
under the control of a regulated or constitutive promoter (see Nasr et al., 1995, herein 
incorporated by reference). One may introduce an antisense construct operably linked 
to an inducible promoter into S. cerevisiae to study the function of a conditional allele 
(see Nasr et al. supra). One problem that may be encountered, however, is that many 
antisense molecules do not work well in yeast, for reasons that are, as yet, unclear (see 
Atkins et al., 1994 and Olsson et al., 1997). 

One may also decrease gene expression by inserting a sequence by 
homologous recombination into or next to the gene of interest wherein the sequence 
targets the mRNA or the protein for degradation. For instance, one can introduce a 
construct that encodes ubiquitin such that a ubiquitin fusion protein is produced. This 
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protein will be likely to have a shorter half-life than the wildtype protein. See, e.g., 
Johnson et al. (1992), herein incorporated by reference. 

In a preferred mode, a gene of interest is completely disrupted in order 
to ensure that there is no residual function of the gene. One can disrupt a gene by 
5 "classical" or PCR-based methods. The "classical" method of gene knockout is 

described by Rothstein (1991), herein incorporated by reference. However, it is 
preferable to use a PCR-based deletion method because it is faster and less labor 
intensive. 

A preferred method to delete a gene is a one-step, polymerase chain 

10 reaction (PCR) based gene deletion method (Rothstein, 1991). Gene specific primer 

pairs are designed for PCR amplification of the plasmid pFA6a-KanMX4 (Wach et al., 
1994), which teachings are herein incorporated by reference. The 3' ends of the 
upstream and downstream gene specific primers have been designed to include 18 
basepairs (bp) and 19 bp, respectively, of nucleotide homology flanking the KanMX 

15 gene of the plasmid pFA6a-KanMX4 template. All of the gene specific primer pairs 

contain these complementary sequences, such that the same plasmid pFA6a-KanMX4 
template can be used for all of the first round PCR reactions. At their 5' ends, the 
primers each have gene specific sequence homologies. The upstream primer contains a 
nucleotide sequence which includes the start codon of the gene to be knocked out and 

20 the sequence immediately upstream of the start codon. The downstream primer 

contains a nucleotide sequence which includes the stop codon of the gene and the 
sequence immediately downstream of the stop codon. For each set of primers, the 
sequences of the gene are derived from the 5' and 3' ends of the target DNA sequence. 

The upstream and downstream primers are then used to amplify the 

25 pFA6a-KanMX4 by PCR using standard conditions for PCR. Hybridization conditions 

for specific gene-specific primers can be experimentally determined, or estimated by a 
number of formulas. One such formula is T m = 81.5 + 16.6 (log 10 [Na + ]) + 0.41 
(fraction G + C) - (600/N). See Sambrook et al. pages 1 1.46-1 1.47. The products of 
the first round PCR reactions are DNA molecules containing the KanMX marker 
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(conferring resistance to the drug G-418 in S. cerevisiae) flanked on both ends by 1 8 
bp of gene specific sequences. 

The gene specific flanking sequences are extended during the second 
round PCR reactions. The sequences of the two gene specific PCR primers are 
derived from the 45 bp immediately upstream (including the start codon) and the 45 bp 
immediately downstream (including the stop codon) of each gene. Thus, following the 
second round of PCR the product contains the KanMX marker flanked by 45 bp of 
gene specific sequences corresponding to the sequences flanking the gene's ORF The 
PCR products are purified by an isopropanol precipitation, and shipped with the 
analytical primers (see below) to the consortium members on dry ice. The precipitated 
PCR products are resuspended in TE buffer (10 mM Tris-HCl [pH 7.6], 1 mM 
EDTA). 

The various mutations are constructed in two related Saccharomyces 
cerevisiae strains, BY4741 (MATa his3Al leu2A0 met 15 AO ura3A0) and BY4743 
(MA MA Ta his3Al/his3Al leu2A0/leu2A0 LYS2/lys2A0 met 15 AO/MET 15 
ura3A0/ura3A0) (Brachmann et al., 1998). Both of these strains are transformed with 
the PCR products by the lithium acetate method as described by Ito et al., 1983, and 
Schiestl and Gietz, 1989, herein incorporated by reference. The flanking, gene- 
specific yeast sequences target the integration event by homologous recombination to 
the desired locus (Figure 1). Transformants are selected on rich medium (YPD) which 
contains G-418 (Geneticin, Life Technologies, Inc.) as described by Guthrie and Fink, 
1991 , herein incorporated by reference. Ideally, independent mutations are isolated in 
the haploid (BY4741) and the diploid (BY4743) strains. The heterozygous mutant 
diploid strain is then sporulated, and subjected to tetrad analysis (Sherman, 1991; 
Sherman and Wakem, 1991, herein incorporated by reference). This allows for the 
isolation of the mutation in a MA Ta haploid strain. The two independently isolated 
MA Ta and MA Ta haploid strains are then mated to create a homozygous mutant 
diploid strain. 
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Methods to Characterize Yeast Gene Function 

One of skill in the art will recognize that a number of methods can be 
used to characterize the function of a yeast gene. In general, the preferred strategy 
depends upon the assumptions made regarding the function of the gene. For example, 
if one creates a conditional allele of the gene, then one can engineer a mutant strain 
wherein the wildtype allele has been replaced by a conditional allele. See, e.g., Stark 
(1998). The strain is constructed and propagated under the permissive condition, and 
then the strain is switched to the non-permissive (or restrictive) condition and effects 
upon the cell's phenotype is monitored. This can be done in a haploid cell, or in a 
diploid cell as either a homozygous or heterozygous mutant. 

A preferred method of characterizing the function of a gene is to 
knockout the gene completely and then analyze the knockout yeast strain by tetrad 
analysis. This method is preferred because one does not need to be able to engineer a 
conditional allele. Furthermore, as the knockout is a null allele, one is assured that it is 
the null phenotype that is assessed, rather than a phenotype resulting from a potentially 
hypomorphic conditional allele. In addition, a complete knockout of the gene can be 
constructed in a diploid strain where the potentially essential function of the gene is 
complemented by the second copy of the gene. 

Once the knockout has been constructed as a heterozygous mutant, the 
effects of the mutation is assessed in the haploid spores. Tetrad analysis of the haploid 
spores allows for the genetic characterization of a mutation because one can determine 
the effect of the homozygous gene linked to the knockout marker (G-418 resistance). 

Any of a number of different tests can be performed to determine the 
effect of knocking out the selected target gene. For instance, one can determine 
whether the yeast cell is more or less responsive to various pharmaceutical compounds 
(e.g., see Figure 4), pH, salinity, osmotic pressure, temperature or nutritional 
conditions. One can determine whether the knockout results in a different observable 
phenotype (e.g., see Figure 22). In addition, yeast cells can be tested for their ability 
to mate, sporulate and bud relative to a wild type control. Thus, these tests may 
provide important information regarding the function of the target gene. 
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Methods to Identify Potential Homologs in Other Organisms 

Once a gene has been identified as a potential target, one can determine 
whether the gene from yeast has homologs in other organisms, such as humans, non- 
human mammals, other vertebrates such as fish, insects, plants, or other fungi. 
5 One method of determining whether an S. cerevisiae gene has 

homologs is by the use of low stringency hybridization and washing. In general, 
genomic DNA or cDNA libraries can be screened using probes derived from the target 
S. cerevisiae gene using methods known in the art. See above and pages 8.46-8.49 
and 9.46-9.58 of Sambrook et al., 1989, herein incorporated by reference. Preferably, 

10 genomic DNA libraries are screened because cDNA libraries generally will not contain 

all the mRNA species an organism can make. Genomic DNA libraries from a variety 
of different organisms, such as plants, fungi, insects, and various mammalian species 
are commercially available and can be screened. This method is useful for determining 
whether there are homologs in organisms whose DNA sequences have not been 

15 characterized extensively 

A second method of determining whether an S. cerevisiae gene has 
homologs is through the use of degenerate PCR. In this method, degenerate 
oligonucleotides that encode short amino acid sequences of the S. cerevisiae gene are 
made. Methods of preparing degenerate oligonucleotides and using them in PCR to 

20 isolate uncloned genes are well known in the art (see Sambrook, pages 14.7-14.8, and 

Crawley et al., 1997, pages 4.2. 1-4.2.5, herein incorporated by reference). 

The most preferred method is to compare the sequence of the S. 
cerevisiae gene to sequences from other organism. Either the nucleotide sequence of 
the gene or its encoded amino acid sequence is compared to the sequences from other 

25 organisms. Preferably, the encoded amino acid sequence of the yeast gene is compared 

to amino acid sequences from other organisms. The sequence of the yeast gene can be 
compared by a number of different algorithms well known in the art. In general, 
computer programs designed for sequence analysis are used for the purpose of 
comparing the sequence of interest to a large database of other sequences. Any 

30 computer program designed for the purpose of sequence comparison can be used in 
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this method. Some computer programs, such as Fasta, produce results that are 
typically presented as "% sequence identity." Other computer programs, such as 
blastp, produce results presented as "p-values." Preferably, the target gene sequence 
will be compared to other sequences using the blastp algorithm. 
5 Nucleotide and amino acid sequences of target genes may be compared 

to vertebrate sequences, including human and non-human mammalian sequences, as 
well as plant and insect sequences using any one of the large number of programs 
known in the art for comparing nucleotide and amino acid sequences to sequences in a 
database. Examples of such programs are Fasta and blastp, discussed above. 
10 Examples of databases which can be searched include GenB ank-EMBL, SwissProt, 

DDBJ, GeneSeq, and EST databases, as well as databases containing combinations of 
these databases. 

As a further characterization, any potential homologs from other 
organisms can be assessed for their ability to functionally complement the yeast 

15 mutant. This can be achieved by first cloning the homolog into a S. cerevisiae 

expression vector by standard methods. This plasmid can then be transformed into the 
heterozygous mutant diploid strain. Upon sporulation and tetrad dissection the ability 
of the homolog to complement the yeast function is determined by whether or not the 
haploid spores complements the yeast knockout and restores the wildtype function of 

20 the haploid spore. The ability of the homolog to complement the yeast mutant would 

indicate shared function(s) and suggest that the homolog may be part of a similar 
pathway in the other organism. 

Nucleic Acids, Vectors and Production of Recombinant Polypeptides 

The present invention provides nucleic acids and recombinant DNA 
25 vectors which comprise S. cerevisiae RIG and target gene DNA sequences. 

Specifically, vectors comprising all or portions of the DNA sequence oiHESl, 
YMR134w, YER034w, Y.JL105w, YKL077w, YGR046w, YJR041c, YER044c and 
YLR1 OOw are provided. The vectors of this invention also include those comprising 
DNA sequences which hybridize under stringent conditions to the HES1, YMR134w, 
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YER034w, YJLIOSw, YKL077w, YGR046w, YJR041c, YER044c and YLRIOOw gene 
sequences, and conservatively modified variations thereof. 

The nucleic acids of this invention include single-stranded and double- 
stranded DNA, RNA, oligonucleotides, antisense molecules, or hybrids thereof and 
5 may be isolated from biological sources or synthesized chemically or by recombinant 

DNA methodology. The nucleic acids, recombinant DNA molecules and vectors of 
this invention may be present in transformed or transfected cells, cell lysates, or in 
partially purified or substantially pure forms. 

DNA sequences may be expressed by operatively linking them to an 
10 expression control sequence in an appropriate expression vector and employing that 

expression vector to transform an appropriate unicellular host Expression control 
sequences are sequences which control the transcription, post-transcriptional events 
and translation of DNA sequences. Such operative linking of a DNA sequence of this 
invention to an expression control sequence, of course, includes, if not already part of 
15 the DNA sequence, the provision of a translation initiation codon, ATG, in the correct 

reading frame upstream of the DNA sequence. 

A wide variety of host/expression vector combinations may be 
employed in expressing the DNA sequences of this invention. Useful expression 
vectors, for example, may consist of segments of chromosomal, non-chromosomal and 
20 synthetic DNA sequences. 

Useful expression vectors for bacterial hosts include bacterial plasmids, 
such as those from E. coli, including pBluescript, pGEX-2T, pUC vectors, col El, 
pCR] , pBR322, pMB9 and their derivatives, wider host range plasmids, such as RP4, 
phage DNAs, e.g., the numerous derivatives of phage lambda, e.g., NM989, A.GT10 
25 and AGT11, and other phages, e.g., M 13 and filamentous single stranded phage DNA. 

In yeast, vectors include Yeast Integrating plasmids {e.g., YIp5) and Yeast Replicating 
plasmids (the YRp and YEp series plasmids), Yeast centromere plasmids (the YCp 
series plasmids), pGPD-2, 2u plasmids and derivatives thereof, and improved shuttle 
vectors such as those described in Gietz and Sugino, Gene , 74, pp. 527-34 (1988) 
30 (YIplac, YEplac and YCplac). Expression in mammalian cells can be achieved using a 
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variety of plasmids, including pSV2, pBC12BI, and p91023, as well as lytic virus 
vectors (e.g., vaccinia virus, adeno virus, and baculovirus), episomal virus vectors 
(e.g., bovine papillomavirus), and retroviral vectors (e.g., murine retroviruses). Useful 
vectors for insect cells include baculoviral vectors and pVL 941. 

In addition, any of a wide variety of expression control sequences -- 
sequences that control the expression of a DNA sequence when operatively linked to 
it ~ may be used in these vectors to express the DNA sequences of this invention. 
Such useful expression control sequences include the expression control sequences 
associated with structural genes of the foregoing expression vectors. Expression 
control sequences that control transcription include, e.g., promoters, enhancers and 
transcription termination sites. Expression control sequences that control post- 
transcriptional events include splice donor and acceptor sites and sequences that 
modify the half-life of the transcribed RNA, e.g., sequences that direct poly(A) 
addition or binding sites for RNA-binding proteins. Expression control sequences that 
control translation include ribosome binding sites, sequences which direct expression 
of the polypeptide to particular cellular compartments, and sequences in the 5' and 3' 
untranslated regions that modify the rate or efficiency of translation. 

Examples of useful expression control sequences include, for example, 
the early and late promoters of SV40 or adenovirus, the Jac system, the trp_ system, the 
TAC or TRC system, the T3 and T7 promoters, the major operator and promoter 
regions of phage lambda, the control regions of fd coat protein, the promoter for 3- 
phosphoglycerate kinase or other glycolytic enzymes, the promoters of acid 
phosphatase, e.g., Pho5, the promoters of the yeast a-mating system, the GAL1 or 
GAL 10 promoters, and other constitutive and inducible promoter sequences known to 
control the expression of genes of prokaryotic or eukaryotic cells or their viruses, and 
various combinations thereof. See, e.g., The Molecular Biology of the Yeast 
Saccharomyces (eds. Strathern, Jones and Broach) Cold Spring Harbor Lab., Cold 
Spring Harbor, N.Y. for details on yeast molecular biology in general and on yeast 
expression systems (pp. 181-209) (incorporated herein by reference)). 
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DNA vector design for transfection into mammalian cells should include 
appropriate sequences to promote expression of the gene of interest, including: 
appropriate transcription initiation, termination and enhancer sequences; efficient RNA 
processing signals such as splicing and polyadenylation signals; sequences that stabilize 
cytoplasmic mRNA; sequences that enhance translation efficiency (i.e., Kozak 
consensus sequence); sequences that enhance protein stability; and when desired, 
sequences that enhance protein secretion. A great number of expression control 
sequences -- constitutive, inducible and/or tissue-specific - are known in the art and 
may be utilized. For eukaryotic cells, expression control sequences typically include a 
promoter, an enhancer derived from immunoglobulin genes, SV40, cytomegalovirus, 
etc., and a polyadenylation sequence which may include splice donor and acceptor 
sites. Substantial progress in the development of mammalian cell expression systems 
has been made in the last decade and many aspects of the system are well 
characterized. 

Preferred DNA vectors also include a marker gene and means for 
amplifying the copy number of the gene of interest. DNA vectors may also comprise 
stabilizing sequences (e.g., ori- or ARS-like sequences and telomere-like sequences), 
or may alternatively be designed to favor directed or non-directed integration into the 
host cell genome. In a preferred embodiment, DNA sequences of this invention are 
inserted in frame into an expression vector that allows high level expression of an RNA 
which encodes a fusion protein comprising encoded DNA sequence of interest. 

Of course, not all vectors and expression control sequences will 
function equally well to express the DNA sequences of this invention. Neither will all 
hosts function equally well with the same expression system. However, one of skill in 
the art may make a selection among these vectors, expression control sequences and 
hosts without undue experimentation and without departing from the scope of this 
invention. For example, in selecting a vector, the host must be considered because the 
vector must be replicated in it. The vector's copy number, the ability to control that 
copy number, the ability to control integration, if any, and the expression of any other 
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proteins encoded by the vector, such as antibiotic or other selection markers, should 
also be considered. 

In selecting an expression control sequence, a variety of factors should 
also be considered. These include, for example, the relative strength of the sequence, 
5 its controllability, and its compatibility with the DNA sequence of this invention, 

particularly with regard to potential secondary structures. Unicellular hosts should be 
selected by consideration of their compatibility with the chosen vector, the toxicity of 
the product coded for by the DNA sequences of this invention, their secretion 
characteristics, their ability to fold the polypeptide correctly, their fermentation or 

10 culture requirements, and the ease of purification from them of the products coded for 

by the DNA sequences of this invention. 

Within these parameters, one of skill in the art may select various 
vector/expression control sequence/host combinations that will express the DNA 
sequences of this invention in fermentation or in other large scale cultures. 

1 5 Given the strategies described herein, one of skill in the art can 

construct a variety of vectors and nucleic acid molecules comprising functionally 
equivalent nucleic acids DNA cloning and sequencing methods are well known to 
those of skill in the art and are described in an assortment of laboratory manuals, 
including Sambrook et al, supra , 1989; and Ausubel et al., 1994 Supplement. Product 

20 information from manufacturers of biological, chemical and immunological reagents 
also provide useful information. 

The recombinant DNA molecules and more particularly, the expression 
vectors of this invention may be used to express the RIG and target genes from S. 
cerevisiae as recombinant polypeptides in a heterologous host cell. The polypeptides 

25 of this invention may be full-length or less than full-length polypeptide fragments 

recombinantly expressed from the DNA sequences according to this invention. Such 
polypeptides include variants and muteins having biological activity. The polypeptides 
of this invention may be soluble, or may be engineered to be membrane- or substrate- 
bound using techniques well known in the art. 



50 



WO 00/58521 



PCT/US00/08604 



Particular details of the transfection, expression and purification of 
recombinant proteins are well documented and are understood by those of skill in the 
art. Further details on the various technical aspects of each of the steps used in 
recombinant production of foreign genes in mammalian cell expression systems can be 
5 found in a number of texts and laboratory manuals in the art. See, e.g., Ausubel et al., 

1989, herein incorporated by reference. 

Transformation and other methods of introducing nucleic acids into a 
host cell (e.g., transfection, electroporation, liposome delivery, membrane fusion 
techniques, high velocity DNA-coated pellets, viral infection and protoplast fusion) can 

10 be accomplished by a variety of methods which are well known in the art (see, for 

instance, Ausubel, supra, and Sambrook, supra). Bacterial, yeast, plant or mammalian 
cells are transformed or transfected with an expression vector, such as a plasmid, a 
cosmid, or the like, wherein the expression vector comprises the DNA of interest 
Alternatively, the cells may be infected by a viral expression vector comprising the 

15 DNA or RNA of interest. Depending upon the host cell, vector, and method of 

transformation used, transient or stable expression of the polypeptide will be 
constitutive or inducible. One having ordinary skill in the art will be able to decide 
whether to express a polypeptide transiently or stably, and whether to express the 
protein constitutively or inducibly. 

20 A wide variety of unicellular host cells are useful in expressing the DNA 

sequences of this invention. These hosts may include well known eukaryotic and 
prokaryotic hosts, such as strains of E. coli, Pseudomonas, Bacillus, Streptomyces, 
fungi, yeast, insect cells such as Spodoptera frugiperda (SF9), animal cells such as 
CHO, BHK, MDCK and various murine cells, e.g., 3T3 and WEHI cells, African green 

25 monkey cells such as COS 1, COS 7, BSC 1, BSC 40, and BMT 10, and human cells 

such as VERO, WI38, and HeLa cells, as well as plant cells in tissue culture. 

Expression of recombinant DNA molecules according to this invention 
may involve post-translational modification of a resultant polypeptide by the host cell. 
For example, in mammalian cells expression might include, among other things, 

30 glycosylation, lipidation or phosphorylation of a polypeptide, or cleavage of a signal 
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sequence to produce a "mature" protein. Accordingly, the polypeptide expression 
products of this invention encompass full-length polypeptides and modifications or 
derivatives thereof, such as glycosylated versions of such polypeptides, mature proteins 
and polypeptides retaining a signal peptide. The present invention also provides for 
5 biologically active fragments of the polypeptides. Sequence analysis or genetic 

manipulation may identify those domains responsible for the function of the protein in 
yeast. Thus, the invention encompasses the production of biologically active 
fragments. The invention also encompasses fragments of the polypeptides which 
would be valuable as antigens for the production of antibodies, or as competitors for 

10 antibody binding. 

The polypeptides of this invention may be fused to other molecules, 
such as genetic, enzymatic or chemical or immunological markers such as epitope tags. 
Fusion partners include, inter alia, myc, hemagglutinin (HA), GST, immunoglobulins, 
P-galactosidase, biotin trpE, protein A, P-lactamase, a amylase, maltose binding 

1 5 protein, alcohol dehydrogenase, polyhistidine (for example, six histidine at the amino 

and/or carboxyl terminus of the polypeptide), lacZ, green fluorescent protein (GFP), 
yeast a mating factor, GAL4 transcription activation or DNA binding domain, 
luciferase, and serum proteins such as ovalbumin, albumin and the constant domain of 
IgG. See, e.g., Godowski et al., 1988, and Ausubel et al., supra. Fusion proteins may 

20 also contain sites for specific enzymatic cleavage, such as a site that is recognized by 

enzymes such as Factor XIII, trypsin, pepsin, or any other enzyme known in the art 
Fusion proteins will typically be made by either recombinant nucleic acid methods, as 
described above, chemically synthesized using techniques such as those described in 
Merrifield, 1963, herein incorporated by reference, or produced by chemical cross- 

25 linking. 

Tagged fusion proteins permit easy localization, screening and specific 
binding via the epitope or enzyme tag. See Ausubel, 1991, Chapter 16. Some tags 
allow the protein of interest to be displayed on the surface of a phagemid, such as 
Ml 3, which is useful for panning agents that may bind to the desired protein targets. 
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Thus, fusion proteins are useful for screening potential agents using the proteins 
encoded by the target genes. 

One advantage of fusion proteins is that an epitope or enzyme tag can 
simplify purification. These fusion proteins may be purified, often in a single step, by 
affinity chromatography. For example, a His 6 tagged protein can be purified on a Ni 
affinity column and a GST fusion protein can be purified on a glutathione affinity 
column Similarly, a fusion protein comprising the Fc domain of IgG can be purified 
on a Protein A or Protein G column and a fusion protein comprising an epitope tag 
such as myc can be purified using an immunoaffinity column containing an anti-c-myc 
antibody. It is preferable that the epitope tag be separated from the protein encoded by 
the target gene by an enzymatic cleavage site that can be cleaved after purification. 
A second advantage of fusion proteins is that the epitope tag can be used to bind the 
fusion protein to a plate or column through an affinity linkage for screening targets. 

In addition, fusion proteins comprising the constant domain of IgG or 
other serum proteins can increase a protein's half-life in circulation for use 
therapeutically. Fusion proteins comprising a targeting domain can be used to direct 
the protein to a particular cellular compartment or tissue target in order to increase the 
efficacy of the functional domain. See, e.g., U.S. Pat. No. 5,668,255, which discloses 
a fusion protein containing a domain which binds to an animal cell coupled to a 
translocation domain of a toxin protein. Fusion proteins may also be useful for 
improving antigenicity of a protein target. Examples of making and using fusion 
proteins are found in U.S. Pat. Nos. 5,225,538, 5,821,047, and 5,783,398, which are 
hereby incorporated by reference. 

Production of Polypeptide Fragments, Derivatives and Muteins and Biological 
Assays Thereof 

Fragments, derivatives and muteins of polypeptides encoded by the RIG 
and target genes can be produced recombinantly or chemically, as discussed above. 
One can produce fragments of a polypeptide encoding a target gene by truncating the 
DNA encoding the target gene and then expressing it recombinantly. Alternatively, 
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one can produce a fragment by chemically synthesizing a portion of the full-length 
polypeptide. One may also produce a fragment by enzymatically cleaving the 
polypeptide. Methods of producing polypeptide fragments are well-known in the art 
(see, e.g., Sambrook et al. and Ausubel et al. supra). 

One may produce muteins of a polypeptide encoded by a target gene by 
introducing mutations into the DNA sequence of the gene and then expressing it 
recombinantly. These mutations may be targeted, in which particular encoded amino 
acids are altered, or may be untargeted, in which random encoded amino acids within 
the polypeptide are altered. Muteins with random amino acid alterations can be 
screened for a particular biological activity. Methods of producing muteins with 
targeted or random amino acid alterations are well known in the art, see e.g., 
Sambrook et al., Ausubel et al., supra, and U.S. Pat. No. 5,223,408, herein 
incorporated by reference Production of polypeptide derivatives are well known in 
the art, see above. 

There are a number of methods known in the art to determine whether 
fragments, muteins and derivatives of polypeptides encoded by a target gene has the 
same, enhanced or decreased biological activity as the wild type polypeptide. One of 
the simplest assays involves determining whether the fragment, mutein or derivative 
can complement the gene function in a cell which does not contain the target gene. 
For instance, one can introduce a DNA encoding a fragment or mutein of a 
polypeptide encoded by a gene into a mutant yeast strain which has the gene of interest 
deleted (see above under "Methods of Producing Mutant Yeast Strains"). If 
introduction of the DNA encoding the fragment or mutein permits the mutant yeast 
strain to regain its wildtype phenotype, then the fragment or mutein is biologically 
active, and complements the deleted gene. 

In one type of screening assay, the target gene or a fragment thereof 
can be used as the "bait" in a two-hybrid screen to identify molecules that physically 
interact with the target gene. See Chien et al. (1991). 

In addition, one may generate genome expression profiles of yeast 
strains to characterize the gene's function. In order to generate such profiles, a non- 
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functional or conditional allele of the gene in a yeast strain must be produced. The 
conditional or non-functional allele may be constructed by any technique known in the 
art, including deleting the gene as described above, making a temperature-sensitive 
allele of the gene or operably linking the gene to an inducible promoter for regulated 
5 expression. If the yeast strain contains a non-functional allele, a genome expression 

profile of the mutant strain is compared to a wild type strain. If the yeast strain 
contains a conditional allele, the yeast strain is first grown under the permissive 
condition to permit expression of the functional product of the targetl gene. Then, the 
yeast strain is shifted to the nonpermissive condition, in which the product of the target 

1 0 gene is not made or is non-functional. The genome expression profile of the yeast 

strain under the nonpermissive condition may be compared to the same yeast strain 
grown under permissive conditions or a wildtype yeast strain. Structure-function 
studies can be performed wherein a library of mutant forms of the gene is screened for 
the ability to complement the knock-out mutant strain. 

15 Fragments, muteins and derivatives may also be micro-injected into a 

mutant yeast strain in which the gene of interest is deleted to determine whether the 
introduction of the fragment, mutein or derivative can complement the genetic defect. 
Similarly, fragments, muteins and derivatives may be microinjected into other cell types 
in which the homologous gene has been deleted. 

20 Finally, if a particular biochemical activity of a polypeptide encoded by 

a target gene is known, this activity can be measured for fragments, muteins or 
derivatives of the polypeptide. For instance, if a target gene encodes a kinase, one 
could measure the kinase activity of the wild type polypeptide and compare it to the 
activity of a fragment, mutein or derivative. 

25 

Production of Antibodies 

The polypeptides encoded by the target genes of this invention may be 
used to elicit polyclonal or monoclonal antibodies which bind to the target gene 
product or a homolog from another species using a variety of techniques well known 
30 to those of skill in the art. Alternatively, peptides corresponding to specific regions of 
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the polypeptide encoded by the target gene may be synthesized and used to create 
immunological reagents according to well known methods. 

Antibodies directed against the polypeptides of this invention are 
immunoglobulin molecules or portions thereof that are immunologically reactive with 
5 the polypeptide of the present invention. It should be understood that the antibodies of 

this invention include antibodies immunologically reactive with fusion proteins. 

Antibodies directed against a polypeptide encoded by a target gene may 
be generated by immunization of a mammalian host. Such antibodies may be 
polyclonal or monoclonal. Preferably they are monoclonal. Methods to produce 

10 polyclonal and monoclonal antibodies are well known to those of skill in the art. For a 

review of such methods, see Harlow and Lane (1988), Yelton et al. (1981), and 
Ausubel et al. (1989) herein incorporated by reference. Determination of 
immunoreactivity with a polypeptide encoded by an target gene may be made by any of 
several methods well known in the art, including by immunoblot assay and ELISA 

1 5 Monoclonal antibodies with affinities of 1 0" 8 M" 1 or preferably 1 0" 9 to 

10" lfl M" 1 or stronger are typically made by standard procedures as described, e.g., in 
Harlow and Lane, 1988 or Goding, 1986. Briefly, appropriate animals are selected and 
the desired immunization protocol followed. After the appropriate period of time, the 
spleens of such animals are excised and individual spleen cells fused, typically, to 

20 immortalized myeloma cells under appropriate selection conditions Thereafter, the 

cells are clonally separated and the supernatants of each clone tested for their 
production of an appropriate antibody specific for the desired region of the antigen. 

Other suitable techniques involve in vitro exposure of lymphocytes to 
the antigenic polypeptides, or alternatively, to selection of libraries of antibodies in 

25 phage or similar vectors. See Huse et al., 1989. The polypeptides and antibodies of 

the present invention may be used with or without modification. Frequently, 
polypeptides and antibodies will be labeled by joining, either covalently or 
non-covalently, a substance which provides for a detectable signal. A wide variety of 
labels and conjugation techniques are known and are reported extensively in both the 

30 scientific and patent literature. Suitable labels include radionuclides, enzymes, 
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substrates, cofactors, inhibitors, fluorescent agents, chemiluminescent agents, magnetic 
particles and the like. Patents teaching the use of such labels include U.S. Patents 
3,817,837; 3,850,752; 3,939,350; 3,996,345, 4,277,437; 4,275,149 and 4,366,241, 
herein incorporated by reference. Also, recombinant immunoglobulins may be 
5 produced (see U.S. Patent 4,816,567, herein incorporated by reference). 

An antibody of this invention may also be a hybrid molecule formed 
from immunoglobulin sequences from different species (e.g., mouse and human) or 
from portions of immunoglobulin light and heavy chain sequences from the same 
species. An antibody may be a single-chain antibody or a humanized antibody. It may 

10 be a molecule that has multiple binding specificities, such as a bifunctional antibody 

prepared by any one of a number of techniques known to those of skill in the art 
including the production of hybrid hybridomas, disulfide exchange, chemical cross- 
linking, addition of peptide linkers between two monoclonal antibodies, the 
introduction of two sets of immunoglobulin heavy and light chains into a particular cell 

15 line, and so forth. 

The antibodies of this invention may also be human monoclonal 
antibodies, for example those produced by immortalized human cells, by SCID-hu mice 
or other non-human animals capable of producing "human" antibodies, or by the 
expression of cloned human immunoglobulin genes. The preparation of humanized 

20 antibodies is taught by U.S. Pat. Nos. 5,777,085 and 5,789,554, herein incorporated by 

reference. 

In sum, one of skill in the an, provided with the teachings of this 
invention, has available a variety of methods which may be used to alter the biological 
properties of the antibodies of this invention including methods which would increase 
25 or decrease the stability or half-life, immunogenicity, toxicity, affinity or yield of a 

given antibody molecule, or to alter it in any other way that may render it more suitable 
for a particular application. 
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Therapeutic Methods Using Nucleic Acids Encoding Target Genes 

Once a target gene has been identified in S. cerevisiae, the gene and its 
nucleotide sequence can be exploited in a number of ways depending upon the nature 
of the target gene. One method is to use the primary sequence of the target gene itself. 
5 For instance, antisense oligonucleotides can be produced which are complementary to 

the mRNA of the target gene. Antisense oligonucleotides can be used to inhibit 
transcription or translation of a target yeast gene. Production of antisense 
oligonucleotides effective for therapeutic use is well-known in the art, see Agrawal et 
al., 1998, Lavrovsky et al., 1997, and Crooke, 1998, herein incorporated by reference. 

1 0 Antisense oligonucleotides are often produced using derivatized or modified 

nucleotides in order to increase half-life or bioavailability. 

The primary sequence of the target gene can also be used to design 
ribozymes that can target and cleave specific target gene sequences. There are a 
number of different types of ribozymes. Most synthetic ribozymes are generally 

15 hammerhead, Tetrahymena and hairpin ribozymes. Methods of designing and using 

ribozymes to cleave specific RNA species are known in the art, see Zhao et al., 1998, 
Larovsky et al., 1997, and Eckstein, 1997, herein incorporated by reference. Although 
hammerhead ribozymes are generally ineffective in yeast (Castanotto et al., 1998), 
other types of ribozymes may be effective in yeast, and hammerhead and other types of 

20 ribozymes are effective in other organisms. 

As discussed above, one can use target yeast genes to identify 
homologous genes in plants and animals, including humans. Therefore, one can design 
ribozymes and antisense molecules to these genes from plants and animals, including 
humans. 

25 Methods Using Neutralizing Antibodies to Proteins Encoded by Target Genes 

The protein encoded by the target gene can be used to elicit neutralizing 
antibodies for use as inhibit the function of the target protein. An antibody may be an 
especially good inhibitor if the target gene of interest encodes a protein which is 
expressed on the cell surface, such as an integral membrane protein Although 
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polyclonal antibodies may be made, monoclonal antibodies are preferred. Monoclonal 
antibodies can be screened individually in order to isolate those that are neutralizing or 
inhibitory for the protein encoded by the target gene. Monoclonal antibodies also may 
be screened for inhibition of a particular function of a protein. For instance, if it is 
5 known that the target gene in yeast encodes an enzyme, one can identify antibodies 

that inhibit the enzymatic activity. Alternatively, if the specific function of a target 
gene is unknown, one can measure inhibition of the protein by determining the genome 
expression profile for yeast cells contacted with the neutralizing antibody. Similarly, 
one can screen antibodies which are directed against animal, plant or human proteins 

10 for inhibition of the protein's activity in appropriate cells. 

Monoclonal antibodies which inhibit a target protein in vitro may be 
humanized for therapeutic use using methods well-known in the art, see, e.g., U.S. Pat. 
Nos. 5,777,085 and 5,789,554, herein incorporated by reference. Monoclonal 
antibodies may also be engineered as single-chain antibodies using methods well- 

15 known in the art for therapeutic use, see, e.g., U.S. Pat. Nos. 5,091,513, 5,587,418, 
and 5,608,039, herein incorporated by reference. 

Neutralizing antibodies may also be used diagnostically. For instance, 
the binding site of a neutralizing antibody to the protein encoded by the target gene can 
be used to help identify domains that are required for the protein's activity. The 

20 information about the critical domains of a target protein can be used to design 

inhibitors that bind to the critical domains of the target protein. In addition, 
neutralizing antibodies can be used to validate whether a potential inhibitor of an target 
protein inhibits the protein in in vitro assays. 



Methods of Identifying Functional Attributes of the Target 

Once a target gene in yeast is identified, the GRM (or an equivalent) is 
used to help identify critical functional attributes of the gene. In order to determine the 
particular transcripts a target gene modifies, one overexpresses the target gene in the 
cells of the GRM. One may also overexpress a conditional allele of the gene in the 
cells of the GRM. Then, one identifies a subset of genes that are either induced or 
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repressed by overexpression of the target gene. Methods for processing data using the 
GRM are also disclosed in United States Patents 5,569,588 and 5,777,888; see also 
United States Patent Application Serial No. 09/076,668, now pending. Once the genes 
that are regulated by a target gene are identified, one can use this information in a 
5 number of ways to identify potential inhibitors or activators of the target protein. 

Alternatively, one may determine the genome expression profile of a cell that has a 
mutation in a target gene, or a cell that has the endogenous target gene replaced either 
with an altered allele or with the counterpart gene from another species. Similarly, 
plant and animal GRMs, including human GRMs, overexpressing target genes can be 

10 used in the same way to identify potential inhibitors or activators of the target protein 

in these organisms. 

Another method for isolating a potential inhibitors or activators of a 
target gene is to use information obtained from the "two-hybrid system" to identify and 
clone genes encoding proteins that interact with the polypeptide encoded by the target 

15 gene (see, e.g., Chien et al.,1991, incorporated herein by reference). The amino acid 

sequences of the polypeptides identified by the two-hybrid system can be used to 
design inhibitory peptides to the target protein. The "two-hybrid" system using 
libraries of the appropriate species can also be used to identify and clone genes 
encoding proteins that interact with the polypeptide encoded by the target genes. 

20 Methods of Using Target Proteins 

Recombinantly expressed target proteins or functional fragments 
thereof can be used to screen libraries of natural, semisynthetic or synthetic 
compounds. Particularly useful types of libraries include combinatorial small organic 
molecule libraries, phage display libraries, and combinatorial peptide libraries. 
25 Methods of determining whether components of the library bind to a particular 

polypeptide are well known in the art. In general, the polypeptide target is attached to 
solid support surface by non-specific or specific binding. Specific binding can be 
accomplished using an antibody which recognizes the protein that is bound to a solid 
support, such as a plate or column. Alternatively, specific binding may be through an 
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epitope tag, such as GST binding to a glutathione-coated solid support, or IgG fusion 
protein binding to a Protein A solid support. Alternatively, the recombinantly 
expressed protein or fragments thereof may be expressed on the surface of phage, such 
as Ml 3 . A library in mobile phase is incubated under conditions to promote specific 
5 binding between the target and a compound. Compounds which bind to the target can 

then be identified. Alternately, the library is attached to a solid support and the 
polypeptide target is in the mobile phase. 

Binding between a compound and target can be determined by a 
number of methods. The binding can be identified by such techniques as competitive 
10 ELISAs or RIAs, for example, wherein the binding of a compound to a target will 

prevent an antibody to the target from binding. These methods are well-known in the 
art, see, e.g., Harlow and Lane, supra. Another method is to use BiaCORE 
(BiaCORE) to measure interactions between a target and a compound using methods 
provided by the manufacturer A preferred method is automated high throughput 
15 screening, see, e.g., Burbaum et al., 1997, and Schullek et al., 1997, herein 

incorporated by reference. 

Once a compound that binds to a target is identified, one then 
determines whether the compound inhibits the activity of the target. If a biological 
function for the target protein is known, one could determine whether the compound 
20 inhibited the biological activity of the protein. For instance, if it is known that the 

target protein is an enzyme, one can measure the inhibition of enzymatic activity in the 
presence of the potential inhibitor. 

In a preferred embodiment, the target gene is selected from YMRI34w, 
YER034w, YJLJ05w, YKL077w, YGR046w, YJR04Jc, YER044c and YLRlOOw and 
25 their mammalian homologs. 

Another embodiment of the invention is to use the recombinantly 
expressed protein for rational drug design. The structure of the recombinant protein 
may be determined using x-ray crystallography or nuclear magnetic resonance (NMR). 
Alternatively, one could use computer modeling to determine the structure of the 
30 protein. The structure can be used in rational drug design to design potential inhibitory 
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compounds of the target (see, e.g., Clackson, Mattos et al., Hubbard, Cunningham et 
al., Kubinyi, Kleinberg et al., all herein incorporated by reference). 

In another embodiment, potential inhibitors of a regulon target gene can 
be identified by the following steps: 
5 a) creating a host cell in which the target gene has been altered or 

inactivated by mutation; 

b) comparing gene expression profiles in the mutated host cell to those in 
a host cell which expresses the normal target gene; 

c) identifying one or more potential target-dependent reporter genes 

10 whose expression is altered in the host cell in which the target gene has 

been altered or inactivated compared to the host cell which expresses 
the normal target gene, and 

d) screening one or more compounds for their effects on expression of the 
target-dependent reporter gene. 

15 If expression of the target-dependent reporter gene increases in the host 

cell harboring an altered or inactivated target gene, then a potential inhibitor of the 
regulon target gene will increase expression of the target-dependent reporter gene, and 
if expression of the target-dependent reporter gene decreases in the host cell harboring 
an altered or inactivated target gene, then a potential inhibitor of the regulon target 

20 gene will decrease expression of the target-dependent reporter gene. 

The method may further comprise the step, performed before step d), of 
assessing the specificity of a potential target-dependent reporter gene by comparing 
gene expression profiles the potential target-dependent reporter gene to a plurality of 
genes in a database of compiled gene expression profiles to generate individual 

25 expression correlation coefficients wherein a target-dependent reporter gene whose 

expression correlates with the expression of the regulon target gene and with a minimal 
number or no other gene is selected over one whose expression correlates with a 
greater number of genes based on expression correlation coefficients. The method 
may also encompass upstream sequences that control expression of the target- 

30 dependent reporter genes fused to a heterologous coding sequence, and the fusion is 
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used to screen compounds for potential inhibitors of the regulon target gene, as 
discussed above. 

In a preferred embodiment, the target gene is selected from YMRJ34w, 
YEli034w, YJL105w, YKL077w, YGR046w, YJR041c, YER044c and YLRlOOw and 
5 their mammalian homologs. 

Pharmaceutical Applications 

Compounds that bind to target proteins or regulate target gene 
expression can be tested in yeast cell systems and heterologous host cell systems (e.g., 
human cells) to verify that they do not have undesirable side effects. In addition, the 

10 yeast GRM can be used to make sure that the compounds do not adversely alter gene 

transcription (e.g., in an undesirable way). Of course, certain changes in gene 
expression may be inevitable and many of these will not be deleterious to the patient or 
host organism. Once lead compounds have been identified, these compounds can be 
refined further via rational drug design and other standard pharmaceutical techniques. 

1 5 The compounds of this invention may be formulated into 

pharmaceutical compositions and administered in vivo at an effective dose to treat a 
particular disease or condition. Determination of a preferred pharmaceutical 
formulation and a therapeutically efficient dose regiment for a given application is 
within the skill of the art taking into consideration, for example, the condition and 

20 weight of the patient, the extent of desired treatment and the tolerance of the patient 
for the treatment. 

Administration of the compounds of this invention, including isolated 
and purified forms, their salts or pharmaceutically acceptable derivatives thereof, may 
be accomplished using any conventionally accepted mode of administration. 

25 The pharmaceutical compositions of this invention may be in a variety 

of forms, which may be selected according to the preferred modes of administration. 
These include, for example, solid, semi-solid and liquid dosage forms such as tablets, 
pills, powders, liquid solutions or suspensions, suppositories, and injectable and 
infusible solutions. The preferred form depends on the intended mode of 
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administration and therapeutic application. Modes of administration may include oral, 
parenteral, subcutaneous, intravenous, intralesional or topical administration. 

The compounds of this invention may, for example, be placed into 
sterile, isotonic formulations with or without cofactors which stimulate uptake or 
5 stability. The formulation is preferably liquid, or may be lyophilized powder. For 

example, the inhibitors may be diluted with a formulation buffer comprising 5.0 mg/ml 
citric acid monohydrate, 2.7 mg/ml trisodium citrate, 41 mg/ml mannitol, 1 mg/ml 
glycine and 1 mg/ml polysorbate 20. This solution can be lyophilized, stored under 
refrigeration and reconstituted prior to administration with sterile Water-For-lnjection 
10 (USP). 

Topical administration includes administration to the skin or mucosa, 
including surfaces of the lung and eye. Compositions for topical administration, 
including those for inhalation, may be prepared as a dry powder which may be 
pressurized or non-pressurized. In non-pressurized powder compositions, the active 

1 5 ingredient in finely divided form may be used in admixture with a larger-sized 

pharmaceutically acceptable inert carrier comprising particles having a size, for 
example, of up to 100 micrometers in diameter. Alternatively, the composition may be 
pressurized and contain a compressed gas, such as nitrogen or a liquified gas 
propellant. The liquified propellant medium and indeed the total composition is 

20 preferably such that the active ingredient does not dissolve therein to any substantial 

extent. 

Dosage forms for topical or transdermal administration of a compound 
of this invention include ointments, pastes, creams, lotions, gels, powders, solutions, 
sprays, inhalants or patches. The active component is admixed under sterile conditions 
25 with a pharmaceutically acceptable carrier and any needed preservatives or buffers as 

may be required. Ophthalmic formulation, ear drops, eye ointments, powders and 
solutions are also contemplated as being within the scope of this invention. 

The pharmaceutical compositions of this invention may also be 
administered using microspheres, microparticulate delivery systems or other sustained 
30 release formulations placed in, near, or otherwise in communication with affected 
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tissues or the bloodstream. Suitable examples of sustained release carriers include 
semipermeable polymer matrices in the form of shaped articles such as suppositories or 
microcapsules. Implantable or microcapsular sustained release matrices include 
polylactides (U.S. Patent No. 3,773,319; EP 58,481), copolymers of L-glutamic acid 
5 and gamma ethyl-L-glutamate (Sidman et al., 1985); poly(2-hydroxyethyl- 

methacrylate) or ethylene vinyl acetate (Langer et al., 1981, Langer, 1982). 

The compounds of this invention may also be attached to liposomes, 
which may optionally contain other agents to aid in targeting or administration of the 
compositions to the desired treatment site. Attachment of the compounds to 

10 liposomes may be accomplished by any known cross-linking agent such as 

heterobifunctional cross-linking agents that have been widely used to couple toxins or 
chemotherapeutic agents to antibodies for targeted delivery. Conjugation to liposomes 
can also be accomplished using the carbohydrate-directed cross-linking reagent 4-(4- 
maleimidophenyl) butyric acid hydrazide (MPBH) (Duzgunes et al., 1992), herein 

15 incorporated by reference. 

Liposomes containing pharmaceutical compounds may be prepared by 
well-known methods (See, e.g. DE 3,218,121, Epstein et al., 1985; Hwang et al.,1980; 
U.S. Patent Nos. 4,485,045 and 4,544,545). Ordinarily the liposomes are of the small 
(about 200-800 Angstroms) unilamellar type in which the lipid content is greater than 

20 about 30 mol.% cholesterol. The proportion of cholesterol is selected to control the 

optimal rate of MAG derivative and inhibitor release 

The compositions also will preferably include conventional 
pharmaceutically acceptable carriers well known in the art (see, e.g., Remington's 
Pharmaceutical Sciences, 16th Edition, 1980, Mac Publishing Company). Such 

25 pharmaceutically acceptable carriers may include other medicinal agents, carriers, 

genetic carriers, adjuvants, excipients, etc., such as human serum albumin or plasma 
preparations. The compositions are preferably in the form of a unit dose and will 
usually be administered one or more times a day. 
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EXAMPLE 1: PREPARATION OF THE Genome Reporter Matrix™ 

Construction of Reporter Gene Fusions (Method 1) 

The regulatory region of each yeast gene was cloned into one of two 
vectors, pABl or pAB2. The vector pABl was constructed in the following manner: 
5 First, the polymerase chain reaction (PCR) was used to amplify the transcriptional 

terminator region from the gene PGK1 using the oligonucleotides 5P-PGKTERM (5'- 
GATTGAATTCAATTGAAATCGATAG-3') and 3P-PGKTERM (5'- 
CCGAGGCGCCGAATTTTCGAGTTAT-3'). The amplified fragment consists of the 
263 base-pair region immediately downstream of the PGK1 stop codon, and contains 

10 an EcoRI site at the 5' end and a Narl site at the 3 ' end. These restriction sites were 

engineered into the two PCR primers (underlined sequences). The terminator was then 
cloned into YIplac21 1 that had been linearized with EcoRI and Narl, yielding pAB34. 
Next, the coding region of the green fluorescent protein (GFP) from Aeqaoria victoria 
was amplified by PCR using the oligonucleotides 5P-GFP-ORF (5- 

1 5 CATGTCTAGAGGAGAAGAACTTTTC-3') and 3P-GFP-ORF (5'- 

CGCGAATTCCTATTTGTATAGTTCA-3'). Again, these oligonucleotides contain 
engineered Xbal and EcoRI sites at the 5' and 3' ends, respectively (underlined). This 
fragment was cloned into pAB34, linearized with Xbal and EcoRI, to produce pAB35 
Finally, the GFP-PGK terminator fragment was moved into the episomal vector 

20 YEplacl95 (9) as an Xbal/Narl fragment, thereby producing pABl. 

The vector pAB2 is pABl with an altered multiple cloning site (MCS). 
The new MCS contains 8 basepair recognition sites for three restriction enzymes. 
These larger 8 base-pair recognition sites occur less frequently throughout the yeast 
genome than the 6 base-pair sites present in the MCS of pABl . Thus, the utilization of 

25 restriction enzymes that recognize 8 base-pair sequences to clone the various 

regulatory regions (engineered into the PCR primers used to amplify the regions) 
would minimize the occurrence of those sites within the regions themselves. To 
construct pAB2, pAB 1 was linearized with Xbal and SphI, dropping out the existing 
MCS, and an adapter containing the new MCS was ligated in. The adapter was made 

30 by hybridizing two oligonucleotides, 8Cutter (5'- 
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CGGCGCGCCGCGGCCGCATGGCCGGCCAAT-3') and 8CutEnd (5'- 
CTAGATTGGCCGGCCATGCGGCCGCGGCGCGCCGCATG-3'). This adapter has 
sites for the restriction enzymes Fsel, NotI, and AscI (underlined). 

The promoter regions were cloned utilizing PCR of genomic DNA 
prepared from a strain derived from S288c; JRY 147 (MATa SUC2 mal mel gal2 
GUP1). The promoter-specific primers were designed such that the proximal primer 
spanned the start codon of the specific gene and included a few (usually four) codons 
derived from the gene. The position of the distal primer was determined on a case-by- 
case basis depending on the distance to, and orientation of, the neighboring open 
reading frame (ORF) and the restriction sites present. Where the upstream ORF was 
positioned in a divergent orientation and within 1,200 base-pairs, the size of the 
promoter fragment amplified was adjusted such that all nucleotides up to, but not 
including, the start codon of the upstream ORF were present. In cases where the 
upstream ORF was situated in the same orientation, the amplified fragment was 
designed to extend into the coding region but not so as to include the start codon. 
Both primers had restriction enzyme recognition sites engineered into the ends to allow 
the subsequent cloning of the PCR fragment into pABl, or pAB2. 

Construction of Reporter Gene Fusions (Method 2) 

In another method for constructing genome reporter constructs, a 
vector comprising a marker gene having an amber mutation and a supF tRN A gene 
which suppresses the amber mutation is used as the parent vector. 

A plasmid cloning vector was constructed which comprises a mutant 0- 
lactamase gene with an amber mutation and a supF tRN A gene. Downstream of the 
supF tRN A gene there is a "stuffer" DNA fragment which is flanked by BsmBI 
restriction sites. The BsmBI restriction enzyme cuts outside of its six base pair 
recognition sequence (see, e.g., New England Biolabs 96/97 Catalog, p. 23) and 
creates a four nucleotide 5' overhang. When the plasmid cloning vector is digested 
with BsmBI, the enzyme cleaved within the stuffer DNA and within the adjoining 
tRNA gene and deleted the four 3' terminal nucleotides of the gene. The deleted supF 
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tRNA gene encodes a tRNA which cannot fold correctly and is non-functional, i.e., it 

could not suppress the amber mutation in the mutant [3-lactamase gene (P-lactamase 

(amber)) Downstream from the stuffer DNA fragment is the coding region of a 

modified green fluorescent protein ("GFP") gene. 

5 The stuffer DNA was excised from the vector by digestion with BsmBI. 

The double-stranded DNA at the supF-stuffer fragment junction, produced by BsmBI 

digestion, is shown below. The tRNA gene sequences are indicated in bold: 

5 ' . . supF. . TC CCCCGGAGACGTC . . stuffer . . 

. . AGGGGG CCTCTGC AG . . 5 ' 

10 BsmBI 

The 3' terminal sequence of the supF gene necessary for proper 

function is TCCCCCACCA. The vector, once cleaved with BsmBI, lacks the supF 

tRNA ACCA terminal nucleotides if the overhangs self-anneals during re- 

circularization of the plasmid in the absence of insert. 

15 A DNA insert containing the upstream regulatory sequence from a 

yeast ORF was generated as a PCR fragment. Two oligonucleotides were designed to 
flank the DNA insert sequences of interest on a template DNA and anneal to opposite 
strands of the template DNA. These oligonucleotides also contained a sequence at 
their respective 5' ends that, when converted into a 5' overhang (in the double-stranded 

20 PCR fragment generated using the oligonucleotides), is complementary to the 

overhangs on the cloning vector generated by BsmBI endonucleolytic cleavage. 

Oligonucleotide #1 comprises the 5' terminal sequence: 5' CCCCACCA 
... The remaining nucleotides 3' to this sequence were designed to anneal to 
sequences at one end of the DNA insert of choice, in this Example, to one of a 

25 multitude of yeast expression control sequences. 

As highlighted in bold above, oligonucleotide #1 comprises the base 
pairs needed to restore the wild-type 3' terminal end of the supFiRNA gene. These 
base pairs are located immediately 3' to the sequence that allows the insert to anneal to 
the overhang in the BsmBI-digested pAB4 vector. 

30 Oligonucleotide #2 comprises the 5' terminal sequence: 5' TCCTG ... 

The remaining nucleotides 3' to this sequence were designed to anneal to sequences at 
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the other end of the DNA insert of choice, in this Example, to one of a variety of yeast 
expression control sequences which may be used according to this invention. 

The DNA template (S. cerevisiae genomic DNA) and the two 
oligonucleotides were annealed and the hybrids were amplified by polymerase chain 
5 reaction using Klentaq™ polymerase and PCR buffer according to the manufacturer's 

instructions (Clontech). Briefly, 1 5 ng S. cerevisiae genomic DNA served as template 
DNA in a lOul PCR reaction containing 0.2mM dNTPs, PCR buffer, Klentaq™ 
polymerase, and 1 uL of an 8uJVl solution containing the primer pairs. The PCR 
reaction mixture was subjected to the following steps: a) 94oC for 3 min; b) 94oC for 

10 15 sec; c) 52oC for 30 sec; d) 72oC for 1 min, 45 sec, and e) 4©C indefinitely. Steps 

b) through d) were repeated for a total of 30 cycles. The PCR amplification product 
was purified away from other components of the reaction by standard methods. 

To generate the desired 5' overhangs on the ends of the PCR 
amplification product, the PCR fragment was treated with DNA polymerase I in the 

1 5 presence of dTTP and dCTP. Under these conditions, DNA polymerase I fills in 3 ' 

overhangs with its 5' to 3' polymerase activity and also generates 5' overhangs with its 
3' to 5' exonucleolytic activity, which, in the presence of excess dTTP and dCTP, 
removes nucleotides in a 3' to 5' direction until a thymidine or a cytosine, respectively, 
is removed and then replaced. 

20 The overhangs generated by this reaction are: 

a) At the 5' end (supF tRNA restoring end) of the DNA insert: 

5' CCCCACCA. . becomes 5' CCCCACCA. . 

GGGGTGGT . . TGGT . . 

25 b) At the 3' end of the DNA insert (joined to the GFP coding sequence): 

5' CAGGA. . becomes 5' C 

GTCCT. . GTCCT. . 

This DNA insert, now comprising 5' overhangs compatible with one of 
each of the ends of the BsmBI-cleaved pAB4 vector, was used as substrate in a 
30 standard ligation reaction with the BsmBI-cleaved pAB4 vector. The resulting ligation 
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mixture was used to transform competent E. coli cells. The cells were plated on agar 
plates in the presence of ampicillin. 

Colonies that grew in the presence of ampicillin were producing 
functional P-lactamase enzyme and each harbored the desired recombinant DNA 
5 molecule, having a DNA insert with a yeast expression control sequence inserted 

upstream of the modified GFP coding region. The supF gene on vectors which re- 
ligated without a DNA insert did not express a functional supF tRNA and did not 
make functional P-lactamase. Thus, they were not found in transformed host cells 
grown on ampicillin. 



1 0 Construction of Yeast Strains 

Strain ABY1 1 (MATa leu2A 1 ura3-52) of S. cerevisiae was used. 
ABY1 1 is derived from S288c. GRM arrays were grown at 30°C on solid casamino 
acid medium (Difco) with 2% glucose and 0.5% UltraPure Agarose (Gibco BRL). 
The medium was supplemented with additional amino acids and adenine (Sigma) at the 

15 following concentrations: adenine and tryptophan at 30 pg/ml; histidine, methionine, 

and tyrosine at 20 (ig/ml, leucine and lysine at 40 p-g/ml. Stock solutions of the 
supplements were made at lOOx concentrations in water. Yeast cells were transformed 
with the reporter plasmids prepared by Method 1 or Method 2 (above) by the lithium 
acetate method (Ito et al., 1983, and Schiestl and Gietz, 1989). 



20 Determinations of Reporter Gene Expression Levels 

Solutions of test compounds were added directly to the yeast strains or 
were coated on plates prior to addition of the yeast strains. The individual strains 
comprising the GRM were maintained as independent colonies (and cultures) in a 96- 
well format, in medium selecting for the URA3-containing reporter plasmid. Prior to 

25 each experiment, fresh dilutions of the reporter-containing strains were inoculated and 

grown overnight at 30°C. A Hamilton MicroLab 4200, a multichannel gantry robot 
equipped with a custom pin tool device capable of dispensing 50 nanoliter volumes in a 
highly reproducible manner, was used to array the matrix of yeast strains in a uniform 



70 



WO 00/58521 



PCT/US00/08604 



manner onto solid agar growth media at a density of 1536 reporter strains per 110 cm 2 
plate. Fifty nanoliters of yeast liquid cultures arrayed onto solid medium by the 
Hamilton MicroLab 4200 results in colony-to-colony signal reproducibility of less than 
5% variation. Once arrayed, each plate was grown at 30°C for 18 hours or at 25 °C 
5 for 24 hours. 

The level of fluorescence expressed from each reporter gene fusion was 
determined using a Molecular Dynamics Fluorimager SI. AIS image analysis software 
(Imaging Research, Ontario CA) was used to quantitate the fluorescence of each 
colony in the images Generally, the drug treatments were performed at several 
10 concentrations, with the analysis based upon the concentration producing the most 

informative expression profile. 

EXAMPLE 2: IDENTIFICATION OF HESI AS A REGULON INDICATOR 

GENE 

The effects of Simvastatin on the Genome Reporter Matrix™ were 
15 tested at a concentration of 20 ug/ml. The HESI reporter gene construct was induced 

by a natural log ratio of 4.2 (treated/untreated), indicating that the HESI reporter had 
an excellent signal to noise ratio induction in response to Simvastatin. The HES1 gene 
encodes a protein with a significant amount of similarity with oxysterol binding 
proteins and has been implicated in isoprenoid metabolism (Figure 35). Analysis of 
20 gene expression data with the Genome Reporter Matrix™ revealed that HES1 

expression is highly correlated with expression of genes encoding enzymes of the 
isoprenoid biosynthetic pathway (Figure 36). 

The specificity of the HESI reporter for inhibitors of ergosterol 
biosynthesis was tested in silico. The expression of the HESI reporter was examined 
25 in data from 710 experimental treatments of the Genome Reporter Matrix™ Basal 

levels of HESI reporter gene expression were 0. 1 units. Units are defined as an 
arbitrary fluorescent value that has been normalized such that a value of 1 0 equals the 
mean reporter fluorescent level of all members of the Genome Reporter Matrix™ in a 
given experiment. All treatments (a total of 5 1) that induced HESI reporter gene 
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levels to 0.5 units or greater were treatments known to inhibit ergosterol biosynthesis, 
indicating a high degree of specificity for this pathway (Figure 37). 

The utility of the HESl reporter gene in a high-throughput screen was 
tested by incubating a yeast strain harboring the HESl reporter in a 384-well array 
5 containing various concentraions of ergosterol biosynthesis inhibitors (Econazole and 

Simvastatin) and nonspecific drugs (Flucytosine and Nifedipine). Cells were grown to 
mid-log phase at 30°C in casamino acids medium (0.67% yeast nitrogen base, 2% 
glucose, 2% casamino acids). Cell density was adjusted prior to incubation in various 
concentrations of drug. Arrays were incubated at 30°C for 24 hrs prior to imaging. 

1 0 The HESl reporter was found to be specifically induced by Econazole and Simvastatin 

but not by Flucytosine or Nifedipine. 

To further test the viability of this indicator gene in a high-throughput 
screen, the regulation of the HESl reporter was tested in two different strain 
backgrounds. ABYll(M4ra leu2Al ura3-52) is a wild-type strain. ABY140 

15 (MA Tr his3Al leu2A0 met 15 AO pdr5::KanMX ura3A0 yorl r.KanMX) is a strain 

containing mutations in two multidrug resistance genes. Induction of the HESl 
reporter gene in ABY140 was found to be more sensitive to Simvastain and Econazole 
but not to Flucytosine or Nifedipine when compared to ABY1 1 . 

The ABY140 [HESl] strain was used to screen approximately 16,800 

20 chemicals from a combinatorial chemistry library. One percent of these chemicals 

induced the HESl indicator gene. Twenty-four of these chemical were further tested 
in a secondary screen for the ability to induce four additional indicator (also referred to 
as reporter) genes whose expression are also coordinately regulated with genes 
encoding ergosterol biosynthetic enzymes. Eight of these twenty-four chemicals also 

25 induced these reporter genes, suggesting that these chemicals interfere with ergosterol 

biosynthesis. 

This example reveals how a high quality promoter sequence identified 
from systematic genome expression data can be employed with a significant degree of 
confidence to identify chemicals with a desired biological activity. 
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The DNA and amino acid sequence of HES1 is shown in Figures 62 
and 63, respectively. 

EXAMPLE 3: IDENTIFICATION OF YJLlOSw AS A TARGET GENE 

YJL105w was a previously uncharacterized ORF which contains a PHD 
5 finger suggesting that it functions as a transcription factor (Figure 1). Gene 

expression correlation coefficients were calculated for 1532 reporter constructs 
including known genes involved in sterol biosynthesis Several uncharacterized genes, 
including YJLlOSw, were found to have highly correlated gene expression with genes 
encoding sterol biosynthetic enzymes. YJLlOSw expression correlated very well (0.83) 

10 with expression of CYBS, a gene involved in ergosterol biosynthesis (Figure 2). 

Cyb5p is thought to be an electron donor for sterol modifying enzymes (Mitchell A.G., 
Martin C.E., ./. Biol. Ghent., 1995, 270(50):29766-72). Expression ofYJLlOSw was 
induced considerably by drugs that inhibit sterol biosynthesis as well as by a mutation 
in the gene encoding HMG-CoA Synthase (Figure 3). The YJL105w reporter 

15 construct comprises 1200 base-pairs of DNA sequence 5' to the ATG start codon and 
thus, contains sequence information sufficient to confer the observed regulated 
expression. 

To test whether YJLlOSw has a role in isoprenoid metabolism, a 
yjllOSw mutant where the entire ORF was replaced with the kanamycin resistance gene 

20 was constructed. Approximately 5 x 10 6 cells of the yjll05w mutant strain and a wild- 

type control strain (ABY363, MAT a his3Al len2A0 lys2A0 ura3A0) were plated 
onto separate non-selective agar plates. The sterol biosynthetic inhibitor lovastatin 
(250ug) was applied to a sterile disk on each lawn and the cells were allowed to grow 
overnight at 30°C. The yjllOSw mutant strain was found to be significantly more 

25 resistant to lovastatin treatment, further implicating this ORF in lipid metabolism 

(Figure 4) 

YJLlOSw appears to be fungal-specific since no apparent mammalian 
counterparts were found. Although YJLlOSw is not an essential gene, it could provide 
utility for constructing strains for specific applications. For instance, the resistance to 
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lovastatin conferred by ayjUOSw mutant could result from an elevated flux through the 
isoprenoid biosynthetic pathway. Such a condition may result from an altered 
composition of the cell's lipid bilayer that triggers the induction of synthesis of 
isoprenoid biosynthetic enzymes and/or reduces the cell's permeability to lovastatin. In 
5 either of these cases, a strain defective for YJL105w could be useful for constructing 

strains that could grow under extreme situations, such as in industrial applications. 
Examples of extreme conditions include growth at high or low temperatures (>35°C or 
<20°C) or in osmotically stressful conditions or in the presence of amphipathic solutes. 
Alternatively, the resistance to lovastatin in the yjllOSw mutant could result from 

10 decreased expression of membrane transporters or channels that allow entry of foreign 

compounds (xenobiotics). In this case, overexpression of YJL105w could produce a 
highly permeablized strain that would have numerous applications where entry of 
compounds into a cell is limited by permeability or availability of compounds. A 
mammalian counterpart of this ORF, if found, could be useful as a diagnostic marker 

15 for people with high serum cholesterol levels. Individuals that have mutations, null or 

weak (hypomorphic) alleles, might be expected to have a higher rate of sterol 
synthesis. 

The DNA and protein sequences of Y.JL105w are depicted in Figures 39 
and 40, respectively. 

20 EXAMPLE 4: IDENTIFICATION OF YMR134w AS A TARGET GENE 

YMRJ34w is an ORF that had been suggested previously to be involved 
in iron metabolism (Figure 5). Among 1532 reporter constructs, YMR134w 
expression was found to be highly correlated with the expression of ERG2 (Figure 6) 
and is therefore likely to be involved in lipid metabolism. The YMR134w reporter 
25 construct was found to be highly induced by various statins (inhibitors of HMG-CoA 

reductase) and azole compounds (inhibitors of lanosterol 14-alpha demethylase, 
ERG J J) (Figure 7). The YMR134w reporter construct comprises 1200 base-pairs of 
DNA sequence 5' to the ATG start codon and thus, contains sequence information 
sufficient to confer the observed regulated expression. A database search for 
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YMR1 34w-rdated protein sequences revealed a weak similarity to human vascular 
endothelial growth factor receptor (Figure 8). 

The DNA and protein sequences of YMR134w are depicted in Figures 
41 and 42, respectively. 

5 EXAMPLE 5: IDENTIFICATION OF YER044c AS A TARGET GENE 

YER044c was a previously uncharacterized yeast ORF with one 
predicted transmembrane domain (Figure 9). YER044c expression is significantly 
correlated with the expression of ERG2 (0.82, Figure 10). Statins, azoles and a 
deletion mutant of the ERG 11 gene each induce expression of the YER044c reporter 

10 construct most significantly in 498 treatments of the GRM (Figure 11). The YER044c 

reporter construct comprises 1200 base-pairs of DNA sequence 5' to the ATG start 
codon and thus contains sequence information sufficient to confer the observed 
regulated expression. DNA and proteins sequence database comparisons with the 
predicted protein sequence of YER044c revealed an apparent Schizosacchctromyces 

15 pombe counterpart and numerous mammalian EST apparent counterparts (Figures 12- 

14). 

The DNA and protein sequences of YER044c are depicted in Figures 
43 and 44 respectively The apparent mouse, human and rat EST counterparts of 
YER044c are depicted in Figures 45-47, respectively. 

20 

EXAMPLE 6: IDENTIFICATION OF YLRlOOw AS A TARGET GENE 

YLRlOOw was a previously uncharacterized yeast ORF (Figure 15). 
Expression of YLRlOOw correlated significantly (0.82) with CYB5 in the GRM 
composed of 6036 reporter constructs in 706 experimental treatments. The correlation 
25 of expression of YLRlOOw to the expression of CYB5 implied a role of YLRlOOw in 

lipid metabolism. Expression of the YLRlOOw reporter was induced significantly by 
statins, azoles and in a yeast ergll mutant consistent with a role of YLRlOOw in lipid 
metabolism (Figure 17). Searches of DNA and protein sequence databases for similar 
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sequences revealed a GenBank entry for a 1 7-beta-hydroxysteroid dehydrogenase 
mouse cDNA (Figure 18). 

The sequence of the mouse cDNA is shown in Figure 53. Given the 
protein sequence similarity (Figure 19) and the fact that yeast is not known to 
5 synthesize steroid hormones, it is conceivable that the mouse cDNA encodes a protein 

with another role in lipid metabolism. In this case, the mammalian protein could have 
utility as a pharmacological target to modulate lipid metabolism. Another GenBank 
entry was found for a rat ovarian specific protein with significant similarity to 
YLRlOOw. The sequence of the rat protein is shown in Figure 65. Two mouse ESTs 
10 were found to be significantly similar to YLRlOOw The sequence of the two mouse 

ESTs are shown in Figures 51 and 52. A human EST was found that was similar to 
YLRlOOw, but to a lesser extent than the two mouse ESTs. 

The DNA and protein sequences of YLRlOOw are depicted in Figures 
48 and 49, respectively The sequence of the human EST is shown in Figure 50. 

1 5 EXAMPLE 7: IDENTIFICATION OF YER034w AS A TARGET GENE 

YER034w is a yeast ORE that had been shown previously not to be 
essential for cell viability (Figure 20). Expression of the YER034w reporter construct 
was found to be correlated (0.75) with the expression of a GPA2 reporter construct in 
a GRM composed of 1532 reporters treated under 498 experimental conditions 

20 (Figure 21). GPA2 encodes the alpha subunit of a trimeric G protein involved in 

pseudohyphal differentiation (Lorentz, M.C. and Heitman, J. EMBO J. 1997 16:7008- 
701 8) This correlation suggested that YER034w had a role in the pseudohyphal 
growth and could represent a new antifungal target 

To test this hypothesis, a diploid homozygous yer034w knockout strain 

25 was purchased from Research Genetics (Huntsville, AL). Wild-type cells (ABY13, 

MATa/MATalpha his3Al/his3Al leu2A0/leu2A0 met 15 AO/MET 15 LYS2/lys2A0 
ura3A0/ura3A0) and the homozygous yer034w knockout strain were plated onto low 
nitrogen plates to stimulate pseudohyphal differentiation. After four days at 25°C, 
plates were examined under a microscope. The yer034w knockout strain had 
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undergone significantly more differentiation than the wild-type control both in terms of 
numbers of projections per colony (Figure 22) and the size of the hyphae. This result 
implicated YER034w in the dimorphic transition of cells from yeast to pseudohyphae. 
The ability of fungi to undergo this morphological transition has been suggested to be a 
5 critical aspect of fungal pathogenicity. A search for related mammalian protein 

sequences did not identify any obvious counterparts suggesting that this protein is 
fungal- specific and may be an amenable anti-fungal target. 

The DNA and protein sequences of YER034w are depicted in Figures 
54 and 55, respectively. 



10 EXAMPLE 8: IDENTIFICATION OF YKL0774w AS A TARGET GENE 

YKL077w was a previously uncharacterized ORF with one predicted 
transmembrane domain (Figure 23). Expression of the YKL077w reporter construct 
was found to be correlated (0.92) with the expression of a SGV1 reporter construct in 
a GRM composed of 1532 reporters treated under 498 experimental conditions 

15 (Figure 24). Sgvlp is a Cdc28p-related protein kinase that is essential for cell 

viability. In addition to Sgvlp expression, YKL077w expression correlated highly 
(>0.8) with PKCl and RHOl (Figure 25), genes involved in cell wall integrity and 
cytoskeletal reorganization. Database searches with the predicted protein sequence of 
YKL077w did not identify apparent mammalian counterparts (Figure 26). YKL077w 

20 could represent an antifungal target given the lack of a mammalian homolog and its 

proposed involvement in cellular structure and/or proliferation. Nevertheless, in the 
event a mammalian counterpart is discovered, it could represent an anti-proliferative 
target as well. 

The DNA and protein sequences of YKL077w are depicted in Figures 
25 56 and 57, respectively. 



EXAMPLE 9: IDENTIFICATION OF YGR046w AS A TARGET GENE 

YGR046w was a previously uncharacterized yeast ORF that has been 
shown to be essential for viability (Figure 27). Expression of YGR046w correlated 
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significantly (0.90) with IRA2 in the GRM composed of 6036 reporter constructs in 
706 experimental treatments (Figure 28). Ira2p is a GTPase activating protein (GAP) 
for Raslp and Ras2p. In addition to IRA2 expression, YGR046w expression correlated 
very well (>0.77) with the expression of known genes involved cell proliferation 
functions (Figure 29). The expression of YGR046w was found to be most sensitive to 
agents that disrupt mitochondrial function, create oxidative stress and disrupt the 
cyto skeleton (Figure 30). 

Given its proposed involvement in cell proliferation, YGR046w could 
represent a target for modulation of cell growth. A search of protein and DNA 
sequence databases did not reveal any apparent mammalian homologs. Nevertheless, if 
such a sequence is identified, it may represent an anti-proliferative mammalian target. 

The DNA and protein sequences of YGR046w are depicted in Figures 
58 and 59, respectively. 

EXAMPLE 10: IDENTIFICATION OF YJR041c AS A TARGET GENE 

Mutant strains defective for YJR041c have been shown previously to 
display a severe growth defect, but no function for YJR04Jc was known (Figure 31). 
Expression of YJR041c correlated significantly (0.83) with MED 7 in the GRM 
composed of 6036 reporter constructs in 706 experimental treatments (Figure 32). 
Med7p encodes a component of the mediator complex involved in RNA polymerase II 
transcription. YJR041c expression was also found to correlate significantly (>0.71) 
with several genes involved in different aspects of RNA metabolism. These processes 
include RNA polymerase I and II transcription, mRNA splicing, RNA turnover and 
ribosome function (Figure 33). 

Database searches for related sequence identified similar sequences 
from Schizosaccharomyces pombe (Figure 34). No obvious mammalian counterparts 
were identified suggesting that YJR04Ic is a fungal-specific protein. Given these 
factors, YJR041c could represent an attractive target for antifungal therapy In the 
event a mammalian counterpart is identified, it also could represent a target with utility 
for modulating cell proliferation. 
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The DNA and protein sequences of YJR041c are shown in Figures 60 
and 61, respectively. 

EXAMPLE 11 SCREENING ASSAY USING THE GENOME REPORTER 
MATRIX™ TO IDENTIFY TARGET INHIBITORS 

5 A mutant or conditional allele of target yeast gene is produced as 

discussed above. The allele may be conditional either for function or expression. For 
instance, the conditional allele may be a temperature-sensitive allele of the target gene 
or the target gene may be operably linked to an inducible promoter for regulated 
expression. In a preferred embodiment, the target gene is operably linked to an 

10 inducible promoter that permits expression anywhere between 0% and 500% of wild 

type expression The target gene of interest is transfected and expressed in yeast cells 
of the GRM that have a functional deletion of the target gene of interest The level of 
expression of the conditional allele is varied between 0% and 500% of wild type 
expression, and the expression of the reporter constructs of the GRM is measured in 

15 response to the expression of the target gene. The expression of the reporter 

constructs is then correlated to the expression of the target gene. Thus, one can 
identify a subset of genes that are either induced or repressed by overexpression of the 
target gene. 

The yeast strains containing the subset of genes whose expression is 
20 dependent upon overexpression, and thus the function of the essential gene, are then 

used to screen compounds that are potential target inhibitors. The yeast strains are 
incubated with the compounds. If a reporter gene in a particular yeast strain is induced 
by overexpression of the target gene, then potential inhibitors are screened for the 
ability to downregulate the reporter gene. Conversely, if a reporter gene is repressed 
25 by overexpression of the target gene, then potential inhibitors are screened for the 

ability to upregulate the reporter gene. Potential inhibitors are screened for the ability 
to appropriately upregulate and downregulate a number of the genes whose expression 
is dependent upon expression or overexpression of the target gene. When potential 
target inhibitors are identified, these candidate compounds are tested for their ability to 
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inhibit the pathway that the target gene is part of. For instance, if the target gene is 
YER034w, then the inhibitor may be tested for antifungal activity. 

If a target gene has a plant or animal counterpart, one may express the 
plant or animal counterpart in a yeast strain lacking the target gene to see if the plant 
5 or animal counterpart can functionally substitute for the yeast gene. If it can, then the 

plant or animal counterpart can be used in the above example to screen for potential 
targets for either a plant or animal inhibitor. This is especially useful if the target gene 
has a mammalian counterpart. Similarly, even if a plant, animal or mammalian 
counterpart has not been identified, potential inhibitors may be tested for their ability to 
1 0 inhibit the pathway that the target gene is part of, if that pathway is shared by yeast and 

higher eukaryotes. 

EXAMPLE 12: SIMULTANEOUS TRACKING OF MULTIPLE REPORTERS 
AS REGULON INDICATOR GENES 

The effects of inactivating an osmotic stress pathway were tested by 
15 deleting a pathway component (Hoglp stress-activated protein kinase). Using the 
hog/ knock-out profile as model, multiple RIGs that would specifically indicate 
pathway inhibitors were identified and tested in silico by examining all conditions in 
which selected RIGs were activated or repressed. It was determined that 
simultaneously monitoring up-regulation ofPGUl and down-regulation of DAK1 gave 
20 good specificity for pathway inactivation as determined by the separation of the hog I 

knock-out profile from all other conditions in which these two reporters were affected 
(Figure 74). In this example, RIGs were not part of the target regulon but were 
chosen empirically based on behavior under all conditions. 

Similarly, 2 RIGs were identified that could specifically indicate 
25 mitochondrial inactivation by comparing the behavior these RIGs in the subset of 

treatments that target mitochondria with all treatments that affect these RIGs. It was 
determined that simultaneously measuring up-regulation of 2 RIGs (STEI8 and 
YGL198w) provides good specificity for mitochondrial perturbations as determined by 
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the separation of this subset of common treatments from all other conditions that affect 
these RIGs (Figure 75). 

All publications and patent applications cited in this specification are 
herein incorporated by reference as if each individual publication or patent application 
5 were specifically and individually indicated to be incorporated by reference. Although 

the foregoing invention has been described in some detail by way of illustration and 
example for purposes of clarity of understanding, it will be readily apparent to those of 
ordinary skill in the art in light of the teachings of this invention that certain 
changes and modifications may be made thereto without departing from the spirit or 
10 scope of the appended claims. 
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CLAIMS 

We claim: 

1 . A method for placing Gene X, a gene of unknown function, into a 
functional genetic group comprising the steps of: 

5 a) generating a gene expression profile for Gene X, 

b) comparing the gene expression profile of Gene X with gene 

expression profiles of a plurality of other genes in a database of 
compiled gene expression profiles to generate expression 
correlation coefficients; 

10 c) identifying based on their expression correlation coefficients a 

set of genes comprising Gene X that are coordinately expressed; 

d) determining if the one or more genes whose expression is most 
highly correlated with that of Gene X belong to a gene regulon 
involved in a known biological pathway, or a common set of 

1 5 biological reactions or functions; and 

e) optionally testing the effect on Gene X expression of at least 
one altered condition or treatment known to affect the function 
to which Gene X hs been ascribed; 

wherein Gene X is placed in the gene regulon of d) if Gene X expression is 
20 coordinate with expression of that regulon. 

2. A method for identifying a regulon indicator gene in a database of 
compiled gene expression profiles, wherein expression of the regulon indicator gene 
correlates with the expression of at least one known gene in a group of coordinately 
expressed genes or provide a measure of the function of a biological process of 

25 interest, the method comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 
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b) identifying based on their relative expression correlation 
coefficients a set of genes that are coordinately expressed; 

c) selecting a set of genes from b) which comprises one or more 
genes known to function in a particular biological pathway, or a 

5 common set of biological reactions or functions; 

d) selecting a member of the set of c) having one or more of the 
following characteristics: 

1 ) its expression profile is sensitive to one or more stimuli; 

2) its expression profile exhibits a large dynamic range in 
10 response to one or more stimuli; 

3) its expression profile exhibits a rapid kinetic response to 
one or more stimuli, 

4) its expression profile is specific to a known biological 
pathway or a common set of biological reactions or 

1 5 functions; 

5) the regulon indicator gene does not contain sequences 
that are problematic for maintaining on plasmids when 
introduced into host cells. 

20 3. The method of claim 2, wherein the regulon indicator gene is co- 

regulated with one or more genes in the group of coordinately expressed genes of c). 

4. The method of claim 2, wherein the regulon indicator gene, upon 
expression, controls the expression of at least one other gene in the group of 
coordinately expressed genes of c). 

25 5. The method of claim 2, wherein the regulon indicator gene is of 

previously unknown function. 
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6. A method for selecting a novel regulon target gene from a database 
of compiled gene expression profiles, comprising the steps of: 

a) comparing gene expression profiles of a plurality of genes in the 
database to generate expression correlation coefficients; 
5 b) identifying based on their expression correlation coefficients a 

set of genes that are coordinately expressed; 

c) selecting from b) a set of genes comprising one or more genes 
of unknown function and one or more genes known to function 
in a particular biological pathway, or a common set of biological 

10 reactions or functions of interest; 

d) selecting from the set of c) at least one gene of unknown 
function, Gene X, as a novel regulon target gene; wherein Gene 
X is a gene whose expression profile closely correlates to the 
expression profiles of the one or more genes of the set of c) 

1 5 known to function in the particular biological pathway, or 

common set of biological reactions or functions of interest. 

7. The method of claim 6, further comprising the step of generating 
individual correlation coefficients between the gene expression profile of Gene X and a 
plurality of genes in the database to assess the selectivity of Gene X as a novel regulon 

20 target gene. 

8. The method of claim 6, further comprising the step of determining 
whether the protein encoded by Gene X exhibits substantial homology to a human, 
non-human mammal, avian, amphibian, fish, insect or plant protein. 

9. The method of claim 8, wherein said determining comprises the 
25 steps of hybridizing Gene X to genomic DNA from human, non-human mammal, 

avian, amphibian, fish, insect or plant cells or tissue under low stringency conditions. 
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10. The method of claim 8, wherein said determining comprises the 

steps of: 

a) comparing the DNA sequence of Gene X to the DNA sequences 
from other organisms or 
5 b) obtaining an amino acid sequence encoded by Gene X and comparing 

it to amino acid sequences from other organisms. 

1 1 . The method of any one of claims 8-10, wherein the DNA or amino 
acid sequences from other organisms are contained within a database, and wherein the 
DNA or amino acid sequence encoded by Gene X is compared to the DNA or amino 

10 acid sequences from other organisms using a computer algorithm. 

12. The method of claim 1 1, wherein the computer algorithm is blastp, 
tblastn or another algorithm that utilizes string alignments. 

13. The method of claim 6, further comprising the steps of: 

a) disrupting the function of Gene X or its homolog in a yeast cell; and 
1 5 b) identifying whether the function of Gene X is essential for yeast 

germination, vegetative growth, pseudohyphal or hyphal growth. 

14. A method for identifying a potential inhibitor of a regulon target 
gene, comprising the steps of: 

a) incubating a polypeptide comprising an amino acid sequence 
20 encoded by a regulon target gene with a compound under 

conditions effective to promote specific binding between the 
polypeptide and the compound; and 

b) determining whether the polypeptide bound to the compound; 
wherein the compound is a potential inhibitor if the compound binds to 

25 the polypeptide. 
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15 The method of claim 14, wherein the polypeptide comprises the 
full-length amino acid sequence encoded by the regulon target gene. 

16. The method of claim 14, wherein the polypeptide comprises a 
functional fragment of the amino acid sequence encoded by the regulon target gene. 

17. The method of claim 14, wherein the polypeptide is a fusion 
protein comprising an epitope tag or reporter gene. 

18. The method of claim 14, wherein the polypeptide is attached to a 
solid support surface and the compound is in mobile phase. 

1 9. The method of claim 14, wherein the compound is attached to a 
solid support surface and the polypeptide is in mobile phase. 

20. The method of claim 14, wherein the compound is a library 
selected from the group consisting of a combinatorial small organic library, a phage 
display library and a combinatorial peptide library 

21 . The method of claim 14, wherein said determining is performed by 
ELISA, RIA or BiaCORE analysis. 

22. The method of claim 14, wherein said determining is performed by 
high throughput screening. 

23 The method of claim 14, further comprising the step, performed 
before step a), of expressing in a host cell a regulon target gene. 
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24. The method of claim 14, wherein the target gene is selected from 
the group consisting of YMR134w, YER034w, YJL105w, YKL077w, YGR046w, 
YJR041c, YER044c and YLRlOOw and their mammalian homologs. 

25. The method of claim 14, wherein the target gene is human EST 
5 W28235, a homolog of YER044c. 

26. The method of claim 14, wherein the target gene is human EST 
R92053, a homolog of YLRlOOw. 

27. The method of claim 14, wherein the target gene is mouse EST 
AI386195, a homolog of YER044c. 

10 28. The method of claim 14, wherein the target gene is mouse EST 

AI2265 1 4, a homolog of YLRlOOw. 

29. The method of claim 14, wherein the target gene is mouse EST 
AI528381, a homolog of YLRlOOw. 

30. The method of claim 14, wherein the target gene is mouse gene 
15 3319971, a homolog of YLRlOOw. 

3 1 . The method of claim 14, wherein the target gene is rat gene 
1397235, a homolog of YLRlOOw. 

32. The method of claim 14, further comprising performing, before 
step a), the step of expressing in a host cell a regulon target gene selected from the 

20 group consisting of YMR134w, YER034w, YJIJOSw, YKL077w, YGR046w, YJR041c, 

YER044c and YLRlOOw and their mammalian homologs. 
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33. A method for identifying a potential inhibitor of a regulon target 
gene, comprising the steps of: 

a) creating a host cell in which the target gene has been altered or 
inactivated by mutation; 
5 b) comparing gene expression profiles in the mutated host cell to 

those in a host cell which expresses the normal target gene; 

c) identifying one or more potential target-dependent reporter 
genes whose expression is altered in the host cell in which the 
target gene has been altered or inactivated compared to the host 

1 0 cell which expresses the normal target gene, 

d) screening one or more compounds for their effects on 
expression of the target-dependent reporter gene, 

wherein if expression of the target-dependent reporter gene increases in 
the host cell harboring an altered or inactivated target gene, then a potential inhibitor 
1 5 of the regulon target gene will increase expression of the target-dependent reporter 

gene, and if expression of the target-dependent reporter gene decreases in the host cell 
harboring an altered or inactivated target gene, then a potential inhibitor of the regulon 
target gene will decrease expression of the target-dependent reporter gene. 

34. The method of claim 33, further comprising the step, performed 

20 before step d), of assessing the specificity of a potential target-dependent reporter gene 
by comparing gene expression profiles the potential target-dependent reporter gene to 
a plurality of genes in a database of compiled gene expression profiles to generate 
individual expression correlation coefficients wherein a target-dependent reporter gene 
whose expression correlates with the expression of the regulon target gene and with a 

25 minimal number or no other gene is selected over one whose expression correlates 

with a greater number of genes based on expression correlation coefficients. 

35. The method of claim 33 or 34, wherein upstream sequences that 
control expression of the target-dependent reporter gene are fused to a heterologous 
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coding sequence and that fusion used to screen compounds for potential inhibitors of 
the regulon target gene. 

36. The method of claim 35, wherein the heterologous sequence 
comprises an epitope tag or a reporter gene. 

5 37. The method of claim 35, wherein the fusion polypeptide is attached 

to a solid support surface and the compound is in mobile phase. 

38. The method of claim 35, wherein the compound is attached to a 
solid support surface and the fusion polypeptide is in mobile phase. 

39. The method of claim 33, wherein the compound is a library 

10 selected from the group consisting of a combinatorial small organic library, a phage 

display library and a combinatorial peptide library. 

40. The method of claim 33, wherein said screening is performed by 
ELTSA, R1A or BiaCORE analysis. 

41 . The method of claim 33, wherein said screening is performed by 
15 high throughput screening. 

42. The method of claim 33, wherein the target gene is selected from 
the group consisting of YMRJ34w, YER034w, YJL105w, YKL077w, YGR046m>, 
YJR041c, YER044c and YLRlOOw and their mammalian homologs. 

44. The method of claim 33, wherein the target gene is human EST 
20 W28235, a homolog ofYER044c. 
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45 . The method of claim 33, wherein the target gene is human EST 
R92053, a homolog of YLRJOOw. 

46. The method of claim 33, wherein the target gene is mouse EST 
AI386195, a homolog of YER044c. 

5 47. The method of claim 33, wherein the target gene is mouse EST 

AI226514, a homolog of YLRJOOw. 

48. The method of claim 33, wherein the target gene is mouse EST 
AI528381, a homolog of YLRJOOw. 

49. The method of claim 33, wherein the target gene is mouse gene 
10 3319971, a homolog of YLRJOOw. 

50. The method of claim 33, wherein the target gene is rat gene 
1397235, a homolog of YLRJOOw.. 

51. A method for inhibiting the expression of a regulon target gene in a 
host cell comprising the step of introducing into the host cell an inhibitor made 

1 5 according to any one of claims 

52. The method of claim 51, wherein the target gene is selected from 
the group consisting of YMRJ34w, YER034w, YJLJ05w, YKL077w, YGR046w, 
YJR04Jc, YER044c and YLRJOOw and their mammalian homologs. 

53. An antisense oligonucleotide comprising a sequence 

20 complementary to the sequence of an mRNA of a regulon target gene and effective to 

decrease transcription or translation of the gene. 
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54. The antisense oligonucleotide of claim 53 complementary to the 
sequence of the mRNA of a target gene selected from the group consisting of 
YMR134W, YER034w, YJLlOSw, YKL077w, YGR046w, YJR041c, YER044c and 
YLRlOOw and their mammalian homologs. 

5 55. A ribozyme comprising a sequence complementary to the sequence 

of an mRNA of a regulon target gene and effective to decrease transcription or 
translation of the gene. 

56. The ribozyme of claim 55 complementary to the sequence of the 
mRNA of a target gene selected from the group consisting of YMR134w, YER034w, 
10 YJL105w, YKL077W, YGR046w, YJR041c, YER044c and YLRlOlhv and their 

mammalian homologs. 

57. A neutralizing antibody to a protein encoded by a regulon target gene of a 
yeast or its mammalian homolog 

58. The neutralizing antibody of claim 57, wherein the target gene is selected from 
15 the group consisting of YMR134w, YER034w, YJLJ05w, YKL077w, YGR046m>, 

YJR041c\ YER044c and YLRlOOw and their mammalian homologs. 

59. A fusion protein comprising an amino acid sequence encoded by a 
regulon target gene of a yeast or its mammalian homolog and further comprising an 
epitope tag or a reporter gene. 

20 60 The fusion protein of claim 59, wherein the target gene is selected 

from the group consisting o?YMR134w, YER034w, YJLlOSw, YKL077w, YGR046w, 
YJR04Jc, YER044c and YLRlOOw and their mammalian homologs. 
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61. A method for identifying a gene regulated by a regulon target gene 
of a yeast or its mammalian homolog, comprising the steps of: 

a) overexpressing the target gene in host cells of a matrix 
comprising a plurality of units of cells, the cells in each unit 

5 containing a reporter gene operably linked to an expression 

control sequence derived from a gene of a selected organism; 
and 

b) identifying genes that'are either induced or repressed by 
overexpression of the target gene. 

10 62. The method according to claim 6 1 , wherein the target gene is 

selected from the group consisting of YMR134w, YER034w, YJL105w, YKL077w, 
YGR()46w, YJR041c, YER044c and YLRlOOw and their mammalian homologs. 

63. A method for identifying a regulon indicator gene in a database of 
compiled gene expression profiles, wherein expression of the regulon indicator gene 
15 provides a measure of the function of a biological pathway or process of interest, the 

method comprising the steps of: 

a) examining exemplary expression profiles in response to one or 
more chemical or genetic treatments which target the pathway 
or process of interest to generate reporter sensitivity data; 
20 b) selecting a set of genes from a) which comprises one or more 

genes most significantly affected in response to the treatment or 
treatments; and 

c) selecting at least one gene from b) whose expression profile is 
maximized for its specificity and sensitivity to the treatment or 

25 class of treatments in a) compared to its sensitivity to all other 

treatments in the database. 
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64. The method of claim 63, wherein the regulon indicator gene is 
co-regulated with one or more genes in the set of genes of a). 

65 The method of claim 63, wherein the regulon indicator gene, upon 
expression, controls the expression of at least one other gene in the set of genes of a). 

5 
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YJL105w 



GenBank No. 



1008286 



Chromosome 



X 



Protein 



559 amino acids 
63,867 Daltons 



Comments: contains a PHD finger 



Figure 1. 
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Figure 2. 
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Regulated Expression of YJL105w 



Natural 

Expt Level Log Ratio Treatment [baseline] 



1455 


9.1 


+3.2 


1454 


8.1 


+3.1 


1537 


7.9 


+3.1 


1420 


7.8 


+3.1 


3455 


7.8 


+3.1 


3456 


7.8 


+3.1 


1944 


6.5 


+2.9 


1943 


6.4 


+2.9 


1554 


5.8 


+2.8 


1419 


5.2 


+2.7 


1553 


5.1 


+2.6 


3454 


5.1 


+2.6 


1538 


4.8 


+2.6 


1421 


4.4 


+2.5 


1541 


4.2 


+2.4 


1456 


4.1 


+2.4 


1539 


4.0 


+2.4 


1540 


4.0 


+2.4 


2756 


3.9 


+2.4 


2757 


3.8 


+2.3 


2061 


3.3 


+2.2 


1982 


3.0 


+2.1 


2060 


2.9 


+2.1 


1542 


2.8 


+2.0 


1999 


2.7 


+2.0 


3279 


2.7 


+2.0 


1935 


2.6 


+2.0 


1478 


2.5 


+1.9 


1477 


2.5 


+1.9 


1983 


2.5 


+1.9 


3468 


2.5 


+1.9 


2754 


2.5 


+1.9 



4.0ug/ml Fluvastatin - 18 hr [0.09] 
8.0ug/ml Fluvastatin - 18 hi [0.13] 
20ug/ml Lovastatin in 1 Ethanol - 18 hr [0. 10] 
20ug/ml Atorvastatin in 1 DMSO - 1 8 hr [0. 1 4] 
20ug/ml Lovastatin - 1 8 hr [0.20] 
25ug/ml Lovastatin - 18 hr [0.20] 
30ug/ml Mevastatinin 1.5 Ethanol - 18 hr [0.20] 
15ug/ml Simvastatin in 1.5 Ethanol - 18 hr [0. 13] 
5ug/ml Simvastatin in 1 Ethanol - 1 8 hr [0. 12] 
30ug/ml Atorvastatin in 1 DMSO - 1 8 hr [0. 12] 
lOug/ml Simvastatin in 1 Ethanol - 18 hr [0.11] 
lOug/ml Lovastatin - 18 hr [0.15] 
lOug/ml Lovastatin in 1 Ethanol - 1 8 hr [0.09] 
lOug/ml Atorvastatin in 1 DMSO - 18 hr [0.12] 
lOug/ml Mevastatin in 1 Ethanol - 18 hr [0.08] 
2.0ug/ml Fluvastatin - 18 hr [0.06] 
5ug/ml Lovastatin in 1 Ethanol - 18 hr [0.08] 
20ug/ml Mevastatin in 1 Ethanol - 1 8 hr [0. 10] 
[hmgs - ABY244.1 regulated (60)] - 18 hr [0.21] 
[lungs - ABY244.1 regulated (80)] - 18 hr [0.20] 
35ug/ml Atorvastatin in 1 Ethanol - 18 hr [0.08] 
0.125ug/ml Clotrimazole in 1 Methanol - 18 hr [0.19] 
25ug/ml Atorvastatin in 1 Ethanol - 18 hr [0.07] 
5ug/ml Mevastatin in I Ethanol - 18 hr [0.08] 
20ug/ml Atorvastatin in 1 Ethanol - 18 hr [0.08] 
0.15ug/ml Clotrimazole in 1 DMSO - 18 hr [0.13] 
0.04ug/ml Econazole in 1 Methanol - 18 hr [0.18] 
2.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.27] 
3 .Oug/ml Fluconazole in 0.9 Saline - 18 hr [0.3 1] 
0.15ug/ml Clotrimazole in 1 Methanol - 18 hr [0.15] 
20ug/ml Lovastatin [ABY139] - 18 hr [0.58] 
[hmgs - ABY244.1 regulated (20)] - 18 hr [0.19] 



Figure 3. 
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Wild-Type YJL105w Knockout 



Figure 4. 
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YMR134w 

GenBank No. 606432 

Chromosome XIII 

Protein 236 amino acids 

27,911 Daltons 

Comments: involved in iron metabolism; potential transmembrane domain 



Figure 5. 
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Figure 6. 
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Treatments Causing Highest Expression of YMR134w 



Experiment Level log ratio Treatment [baseline] 



1943 


1.3 


+1.8 


1944 


1.2 


+1.7 


1419 


1.2 


+1.7 


1537 


1.2 


+1.7 


1454 


1.2 


+1.7 


1477 


1.0 


+1.5 


1553 


0.9 


+1.5 


1455 


0.9 


+1.5 


3455 


0.9 


+1.5 


3456 


0.9 


+1.5 


1538 


0.9 


+1.4 


3454 


0.9 


+1.4 


1478 


0.8 


+1.4 


1540 


0.8 


+1.3 


1420 


0.8 


+1.3 


1611 


0.8 


+1.3 


1554 


0.7 


+1.2 


3279 


0.7 


+1.2 


3469 


0.7 


+1.2 


1605 


0.7 


+1.2 


1936 


0.7 


+1.1 


3468 


0.7 


+1.1 



15ug/ml Simvastatin in 1.5 Ethanol - 18 hr [0.13] 

30ug/ml Mevastatin in 1.5 Ethanol - 18 hr [0.20] 

30ug/ml Atorvastatinin 1 DMSO - 18 hr [0.12] 

20ug/ml Lovastatin in 1 Ethanol - 18 hr [0.10] 

8.0ug/ml Fluvastatin - 18 hr [0.13] 

3.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.31] 

lOug/ml Simvastatin in 1 Ethanol - 18 hr [0.1 1] 

4.0ug/ml Fluvastatin - 18 hr [0.09] 

20ug/ml Lovastatin - 18 hr [0.20] 

25ug/ml Lovastatin - 18 hr [0.20] 

lOug/ml Lovastatin in 1 Ethanol - 18 hr [0.09] 

lOug/ml Lovastatin - 18 hr [0.15] 

2.0ug/ml Fluconazole in 0.9 Saline - 18 hr [0.27] 

20ug/ml Mevastatin in 1 Ethanol - 18 hr [0.10] 

20ug/ml Atorvastatin in 1 DMSO - 1 8 hr [0. 14] 

lOug/ml Fluconazole - 21 hr [0.04] 

5ug/ml Simvastatin in 1 Ethanol - 18 hr [0.12] 

0.15ug/ml Clotrimazole in 1 DMSO - 18 hr [0.13] 

25ug/ml Lovastatin [ABY139] - 18 hr [0.57] 

5ug/ml Fluconazole - 21 hr [0.04] 

0.05ug/ml Econazole in 1 Methanol - 18 hr [0.14] 

20ug/ml Lovastatin [ABY139] - 18 hr [0.58] 



Figure 7. 
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YER044c 

GenBank No. 603277 

Chromosome V 

Protein 148 amino acids 

17,140 Daltons 

Comments: unknown function; potential transmembrane domain 



Figure 9. 
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ERG2 



Figure 10. 
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Treatments Causing Highest Expression of YER044c 



Experiment Level log ratio Treatment [baseline] 



1419 


4 


.2 


+ 1.7 


30ug/ml Atorvastatin in 1 DMSO - 18 


hr [0.12] 


1420 


3 


. 6 


+ 1.5 


20ug/ml Atorvastatin in 1 DMSO - 18 


hr [0.14] 


1617 


3 


.3 


+ 1.4 


20ug/ml Fluconazole - 21 hr [0.04] 




1454 


3 


.2 


+ 1.4 


8.0ug/ml Fluvastatin - 18 hr [0.13] 




1537 


3 


. 1 


+ 1.4 


20ug/ml Lovastatin in 1 Ethanol - 1 


8 hr [0.10] 


1943 


3 


. 0 


+1.3 


15ug/ml Simvastatin in 1.5 Ethanol 


- 18 hr [0.13] 


1623 


3 


. 0 


+ 1.3 


lOOug/ml Fluconazole - 21 hr [0.04] 




3456 


3 


.0 


+ 1.3 


25ug/ml Lovastatin - 18 hr [0.20] 




3455 


3 


.0 


+ 1.3 


20ug/ml Lovastatin - 18 hr [0.2 0] 




1611 


2 


.9 


+ 1.3 


lOug/ml Fluconazole - 21 hr [0.04] 




1553 


2 


.7 


+ 1.2 


lOug/ml Simvastatin in 1 Ethanol - 


18 hr [0.11] 


3454 


2 


.5 


+ 1.1 


lOug/ml Lovastatin - 18 hr [0.15] 




1605 


2 


.5 


+1.1 


5ug/ml Fluconazole - 21 hr [0.04] 




3279 


2 


.5 


+ 1.1 


0.15ug/ml Clotrimazole in 1 DMSO - 


18 hr [0.13] 


1455 


2 


.4 


+ 1.1 


4.0ug/ntl Fluvastatin - 18 hr [0.09] 




1669 


2 


. 4 


+1.1 


lOOug/ml Fluconazole - 8 hr [0.05] 





Figure 11. 
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Mouse EST with similarity to YER044c 



gb | AI386195 IAI386195 mq60hOS.yl Soares 2NbMT Mus musculus cDNA clone 

IMAGE: 583161 5' similar to SW: YEN4_YEAST P40030 HYPOTHETICAL 
17.1 KD PROTEIN YER044c. ;, mRNA sequence 
[Mus musculus] 
Length =455 

Score = 81.5 bits (198), Expect = 6e-15 

Identities = 40/114 (35%), Positives = 68/114 (59%) 

Frame = +3 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct: 9 3 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVI RCLC 272 

Query: 83 AMYLNEPHIFELVFMSYMVALFHFGSELLIFRTCKLGKGFMGPLWSTTSLVWM 136 

A+ ++ ++ + ++++AL HF SEL +F T G + PL+V++ S++ M 

Sbjct: 273 AI D I HNKT L YH I T LWT FLLALXHFLS ELFVFGTAAP TVGVLAP LMVAS FS I LGM 434 



Human EST with similarity to YER044c 



gb | W28235 | W28235 43h8 Human retina cDNA randomly primed sublibrary 
Homo sapiens cDNA. 
Length = 839 

Score = 69.9 bits (168), Expect = 2e-ll 
Identities = 33/94 (35%), Positives = 55/94 (58%) 
Frame = +1 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct: 112 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLYTGKPNLVNGLQARTFGIWTLLSSVIRCLC 291 

Query: 83 AMYLNEPHIFELVFMSYMVALFHFGSELLIFRTC 116 

A+ ++ ++ + ++++AL HF SEL + C 
Sbjct: 292 AIDIHNKTLYHITLWTFLLALGHFLSELFVLWNC 393 



Figure 13. 
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Rat EST with similarity to YER044c 



gb|AI172515 | AI172515 UI-R-C2p-nu-d-02-0-UI . si UI-R-C2p Rattus 

norvegicus cDNA clone UI-R-C2p-nu-d-02-0-UI 3', mRNA 
sequence [Rattus norvegicus] 
Length = 475 

Score = 80.8 bits (196), Expect = le-14 

Identities = 40/114 (35%), Positives = 68/114 (59%) 

Frame - -3 

Query: 23 LPKWLLFISIVSVFNSIQTYVSGLELTRKVYERKPTETTHLSARTFGTWTFISCVIRFYG 82 

L WL+ +SI+++ N++Q++ L K+Y KP L ARTFG WT +S VIR 

Sbjct: 404 LRSWLVMVSIIAMGNTLQSFRDHTFLYEKLyTGKPNLVNGLQARTFGIWTLLSSVIRCLC 225 

Query: 83 AMYLNEPHIFELVFMSYMVALFHFGSELLIFRTCKLGKGFMGPLWSTTSLVWM 136 

A+ ++ ++ + ++++AL HF SEL +F T G + PL+V++ S++ M 

Sbjct: 224 AIDIHNKTLYHITLWTFLLALGHFLSELFVFGTAAPTVGVLAPLMVASFSILGM 63 



Figure 14. 
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YLRlOOw 



GenBank No. 



1360483 



Chromosome 



XII 



Protein 



347 amino acids 
39,725 Daltons 



Comments: 



unknown function; see S. Huang et al., Biochemistry, 26, pp. 
8242-46(1987) 



Figure 15. 
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YLRlOOw 




CYB5 



Figure 16. 
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Treatments Causing Highest Expression of YLRlOOw 



Experiment Level Treatment [baseline] 

6092 8.3 20ug/ml Lovastatin in 1 Ethanol. [ABY12.1] - 24 hr [0.15] 
8717 6.7 lOug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr [0.14] 

6093 6.3 lOug/ml Lovastatin in 1 Ethanol [ABY12.1] - 24 hr [0.16] 
8716 6.1 7.5ug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr [0.13] 
8715 4.9 5ug/ml Simvastatin in 1 DMSO [ABY12.1] - 24 hr [0.12] 

6094 4.4 Sug/ml Lovastatin in 1 Ethanol [ABY12.1] - 24 hr [0.13] 
8705 2.7 [ergll - ABY210 regulated (100)] - 24 hr [0.17] 

6088 2.6 0. lug/ml Sulconazole in 1 DMSO [ABY12.1] - 24 hr [0.12] 

8341 2.5 0.025ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.15] 
8460 2.4 0. lug/ml Clotrimazole in 1 DMSO [ABY12.1] - 24 hr [0.12] 

8462 2.3 0.135ug/ml Clotrimazole in 1 DMSO [ABY12.1] - 24 hr [0.17] 
8 4 61 2.3 0.12ug/ml Clotrimazole in 1 DMSO [ABY12.1] - 2 4 hr [0.14] 

8342 2.3 0.03ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.19] 
8703 2.1 [ergll - ABY210 regulated (80)] - 24 hr [0.14] 

8340 2.0 0.02ug/ml Miconazole in 1 DMSO [ABY12.1] - 24 hr [0.12] 

8463 2.0 0.15ug/ml Clotrimazole in 1 DMSO [ABY12.1] - 24 hr [0.25] 
8701 1.9 [ergll - ABY210 regulated (60)] - 24 hr [0.14] 



Figure 17. 
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Alignment of YLRIOOw to Mammalian ESTs 

gb | AI226514 I AI226514 uj07d08.yl Sugano mouse liver mlia Mus musculus cDNA 
clone 

IMAGE: 1891215 5' similar to TR:Q62904 Q62904 
OVARIAN- SPECIFIC PROTEIN. ;, mRNA sequence [Mus 
musculus] Length = 1039 

Score = 63.2 bits (151), Expect = 5e-09 

Identities = 53/223 (23%), Positives = 108/223 (47%), Gaps = 11/223 (4%) 

Query: 3 RKVAIVTGTNSNLGLNIVFRLIETEDTNVRLTIWTSRTLPRVQEVINQIKDFYNKSGRV 62 

RKV ++TG +S +GL + RL+ +D L++RL++V++ ++ 
Sbjct: 52 RKWLITGASSGIGLALCGRLLAEDDD LHLCLACRNLSKARAVRDTLLASHPSA 213 

Query: 63 EDLEIDFDYLLVD FTNMVSVLNAYYDINKKYRAINYLFVNAA QGIFDGIDW 113 

+ + +D +++ SV+ ++ +K++ ++YL++NA + F GI + 

Sbjct: 214 EVSIVQMDVSSLQSVVRGAEEVKQKFQRLDYLYLNAGILPNPQFNLKAFFCGI-F 375 

Query: 114 IGAVKEVFTNPLEAVTNPTYKIQLVGVKSKDDMGLIFQANVFGPYYFISKILPQLTRGK- 172 

V +FT E + + G++ +F+ N+FG + I ++ P L 

Sbjct: 376 S RNVIHMFTTA- EGI LTQNDSVTADGLQE VFETNLFGHFILIRELEPLLCHADN 534 

Query: 173 -AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKRLVDLLHLATYKDLKKLGI 225 

+ ++W SS + SL DI+ K Y + DLL++A ++ K G+ 

Sbjct: 535 PSQLIWTSSRNAKKANFSLEDIQHFKGPEPYSSFQYATDLLNVAXNREFKPEGL 696 

gb | AI528381 1 AI528381 ui96g06.yl Sugano mouse liver mlia Mus musculus cDNA 
clone 

IMAGE: 1890298 5' similar to TR:Q62904 Q62904 
OVARIAN-SPECIFIC PROTEIN. ; , mRNA sequence [Mus 
musculus] Length = 837 

Score =52.3 bits (123), Expect = le-05 

Identities = 59/260 (22%), Positives = 119/260 (45%), Gaps = 11/260 (4%) 

Query: 3 RKVAIVTGTNSNLGLNIVFRLIETEDTNVRLTIWTSRTLPRVQEVINQIKDFYNKSGRV 62 

RKV ++TG +S +GL + RL+ +D L++RL++V++ ++ 
Sbjct: 52 RKWLITGASSGIGLALCGRLLAEDDD LHLCLACRNLSKARAVRDTLLASHPSA 213 

Query: 63 EDLEIDFDYLLVD FTNMVSVLNAYYDINKKYRAINYLFVNAA QGIFDGIDW 113 

+ + +D +++ SV+ ++ +K++ ++YL++NA + F GI + 

Sbjct: 214 EVSIVQMDVSSLQSWRGAEEVKQKFQRLDYLYLNAGILPNPQFNLKAFFCGI-F 375 

Query: 114 IGAVKEVFTNPLEAVTNPTYKIQLVGVKSKDDMGLIFQANVFGPYYFISKILPQLTRGK- 172 

V +FT E + + + D + +F+ N+ + I ++ P L 

Sbjct: 376 SRNVIHMFTTA-EGILTQNDSV TADRLQEVFETNLSCHFILIRELEPLLLHADN 534 

Query: 173 -AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKRLVDLLHLATYKDLKKLGINQYWQ 231 

+ ++W SS + SL D + Y + DLL++A + + G+ + 

Sbjct: 535 PSQLIWTSSRNAXKANFSLEDXQHSIGPGPYSSFQYATDLLNVALNXNXNQKGLYSSRMC 714 

Query: 232 PGIFTSHSFSEYLNFFTYFGMLCLFYLARLL 262 

PG+ ++ TY G+L FYL LL 

Sbjct: 715 PGWMTN MTY-GILPPFYLDVLL 7 80 



Figure 19. 
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gb I R92053 | R92053 yp96c01.rl Homo sapiens cDNA clone 195264 5 '. Length = 454 
Score =44.1 bits (102), Expect = 0.003 

Identities = 26/84 (30%), Positives = 40/84 (46%), Gaps = 2/84 (2%) 
Frame = +1 

Query: 150 FQANVFGPYYFISKILPQLTRGK — AYIVWISSIMSDPKYLSLNDIELLKTNASYEGSKR 207 

F+ NVFG + I ++ P L + ++W SS + SL D + K Y SK 

Sbjct: 1 FETNVFGHFILIRELEPLLCHSDNPSQLIWTSSRSARKSNFSLEDFQHSKGKEPYSSSKY 180 

Query: 208 LVDLLHLATYKDLKKLGINQYWQPG 233 

DLL +A ++ + G+ V PG 
Sbjct: 181 ATDLLSVALNRNFNQQGLYSNVACPG 258 

Figure 19 (cont). 
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YER034w 

GenBank No. 603267 

Chromosome V 

Protein 185 amino acids 

21,186 Daltons 

Comments: unknown function; see S. Huang et al., Biochemistry, 26, pp. 

8242-46 (1987) 



Figure 20 
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Figure 21. 
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Mutation of the YER034w Gene Leads 
to Increased Pseudohyphal Growth 




Wild Type yer034w A 



Figure 22. 
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YKL077w 

GenBankNo. 486110 

Chromosome XI 

Protein 392 amino acids 

46,042 Daltons 

Comments: unknown function; potential transmembrane domain 



Figure 23 
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YKL077w 




SGV1 



Figure 24. 
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Expression Correlation of YKL077w 



Rank 


Gene 


Correlation 


Exp 




Function 


1 


YKL077W 


+ 1.00 


0.5 - 


9.1 




2 


SGV1 


+0.92 


0.7 - 


14 . 4 


CDC28/cdc2 related protein kinas 


3 


RH01 


+0.88 


1.3 - 


20 . 9 


GTP-binding protein 


4 


YKL075C 


+0.86 


0.2 - 


2.5 




5 


SRA3 


+0.84 


0.3 - 


4.6 


catalytic subunit of PKA 


6 


RPB4 


+0.84 


0.3 - 


7.8 


subunit of RNA polymerase II 


7 


PKC1 


+0.84 


0.6 - 


11.7 


putative protein kinase 



Figure 25. 
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YGR046w 



GenBankNo. 



1323049 



Chromosome 



VII 



Protein 



385 amino acids 
44,219 Daltons 



Comments: 



essential gene in yeast 



Figure 27 
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YGR046w 




IRA2 



Figure 28. 
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YJR041c 



GenBank No. 



1015693 



Chromosome 



X 



Protein 



1173 amino acids 
135,096 Daltons 



Comments: 



essential gene in yeast; contains a leucine zipper; potential 
transmembrane domain 



Figure 3 1 



33/88 

SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 



PCT/US00/08604 




Figure 32. 
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HES1 



GenBank No. 



1420543 



Chromosome 



XV 



Protein 



433 amino acids 
49,502 Daltons 



Comments: 



implicated in ergosterol pathways; related to human oxysterol 
binding protein 



Figure 35 
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Expression Correlation to HES1 



Gene Correlation Exp Function 
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Figure 36. 
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B 










GRM Profile Comparisons 

Lower scores, indicate a closer march to the reference profile. 
Select the experiment number for details of the comparison 



Comparisons to experiment 395 - 15 Alphi factor (YM + Cwamino Adds) 



Control data: 356 (225) 
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Figure 38 
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FIGURE 39. YJL105w DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 
Symbols: 1 to: 2883 from: chrlO.gcg ck- 4711 

223552 to: 226434 
Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert, F . , Alexandraki, D., Baur, A., Boles, E . , 
Chalwatzis, N., Chuat, J. C, Coster, F . , Cziepluch, C, De 
Haan, M., Domdey, H., . . . 

gcgseq.tmp. 4454 Length: 2883 March 26, 1999 16:51 Type- 
N Check: 6274 .. 

1 TGGAAAAGCT CACTGTGAGG TTCCTTGGAG CCAATAGTAA TACAGCACAA 

51 TCCAAGGAAA AATCTGGCCT ATATGCAAGG AAGGAGAGAT AGTCAAAAGC 

101 ATTCTTTCCC CTAGAAGTTG GTGCATATAT GGCATCGTTA AAACATATTA 

151 CCCCCAAAAT TTCTTCTCTA AACGATGTGC TTGGCCTTTG TTTTGGTTTT 

2 01 TGATGTCGGT CGTTTGAGGC CCCTTGCGGA AAATCGAGAT CGCCGAATGG 

251 CACGCGAGGG AAGGGAAATA AGGTTTAAAG GCACTGAAAC AATAGGCAAG 

301 AAGTAGGCGA GAGCCGACAT ACGAGACTAA TGTGTCCGCG TTTCTAAGGC 

351 CACTTTTCAA TGAAACGGAT ATTGATATGC TAGTAAAAGG ACGAGCTCAA 

4 01 GAGCGAAAAT ATAAGTAAAG AATTCGAGTG CACTTGTCTC CATGCAGCAA 

451 GATTTCATAT GAGTCTTTTT TATCTTTTTA CTTTTTACAT TACACGATAT 

501 GCACTTTATG AAAATTTAAC GAGGTTGGAA GCCGGATAAT CAACCAAAAT 

551 CAGGCACGAA GGCACACTCG TATATGCATG TTGTTGAAAC TCTGTTACGC 

601 TGAACTAACA ATCACACATG TAGAGGTCAC CGGGAAAAGT TGCGACCCCA 

651 TGGAAGGTCG ATCTCTTCGT TTGGCTTTGC TTGGCTGGCG GCATTGCGCT 

701 TCTTCGCTTA TACCCGTCTC TTGACGCTCG AGCTCGTTCA TTGAGATACC 

751 TTTATTCTTG CACATTTTCT GGCTTTTTTC GCTACTCGGG TACATGTAAT 

8 01 CATGCACACA GAAGGTGCTG TAGGGTGAAA GTTCCTTTGT GCTGTCGTTT 

8 51 GTTTTTAATG CCAAACTTTC TGGTGATCAA TAACCACCTC TTTTTCCTTC 

901 AGGAAACCTT ATTATTGTTC TTGGATAGTA CTAGGAAGTA TATAAGGAAC 

951 CTCGATTTTG GTATTGCACG GCTATACACA TCTAAGAAAC TTTGTATAAA 

1001 AGGTGGCTAC CCTATTCATA GCTTGATATC AATAGGCCAT CTCATCACTT 

1051 TTTATTGAAA AGGAAAGGAG GGAAATATAT CTGATTCAAA TTACTTGTTT 

1101 GCTTCTCTTT AAGACAAAAG CATAGATAAT TTCAGCGTGG AACGCCGGAA 

1151 TAAGATTGGT ACCCTCGTCA GAAAGTTACA AATACCGCTT CATCTTCAAA 

1201 ATGACTTCAC CGGAATCACT ATCTTCTCGT CATATCAGGC AAGGAAGGAC 

1251 ATACACAACC ACAGACAAGG TCATATCGCG GTCGTCGTCG TACTCATCTA 

1301 ATAGTTCAAT GTCTAAAGAT TACGGCGATC ACACACCCTT GTCCGTCAGC 

1351 AGTGCAGCTT CAGAGACATT ACCCTCACCT CAGTATATGC CGATAAGGAC 

1401 ATTCAATACA ATGCCTACAG CTGGCCCAAC GCCTTTACAT TTATTTCAAA 

14 51 ATGACAGGGG CATTTTCAAC CATCATTCTT CATCAGGCTC ATCAAAAACG 

1501 GCATCAACAA ATAAAAGAGG AATAGCAGCA GCAGTAGCAT TGGCAACTGC 

1551 TGCCACCATA CCATTTCCAC TGAAAAAACA GAATCAAGAT GATAATTCCA 

1601 AGGTCTCGGT AACACACAAT GAATCATCGA AAGAAAATAA AATTACACCC 

1651 TCCATGAGAG CAGAAGATAA CAAACCTAAA AATGGTTGCA TCTGCGGTTC 

17 01 AAGTGACTCC AAGGATGAGT TGTTTATACA GTGTAACAAA TGTAAAACGT 

1751 GGCAGCACAA GTTATGTTAT GCTTTCAAAA AATCAGATCC AATAAAAAGA 
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1801 GATTTTGTTT GCAAAAGATG 

1851 AGTAAAACCA ATGATATTCC 

1901 AATTTTCATC CATAGTGACA 

1951 CAGTCTGTGA ATAACATAGA 

2001 TACCGCCCCA ACAACTGAAA 

2051 AAGAAAAACT GGTAGTATCA 

2101 GTAAGTTCTT CCAATGACAC 

2151 TAAGGACAAA TATGTTAAGA 

2201 GGGTTGTTTG TTCTAACTGG 

2251 AGAAAATCAT CAAATGAAAG 

2301 TGTTAAAGGT GAGCTAATTC 

2351 AAAATTATCA GACAGATCCA 

2401 AAACCTAAAG TACTTTTTCA 

2451 AGAAACAGGC GGATTAACAA 

2501 TGGAACTAGT AACGGTAAGA 

2551 GATTGTAGAG TTAAATTTGT 

2 601 AGAAGAGATA AGCGTAGAAT 

2 651 AGATAATAAA TGCATCTAAA 

27 01 TTCTGGTTGA TGGGGTCAAT 

27 51 ATGTGGGTAC TTGGGCCATA 
2801 CTGAAGAATT CATGAGGAAT 

28 51 TTTAATACAA TAATGCACAA 



TGACAGTGAT ACGAAAGTGC AGGTTAATCA 
CTAGAAAAAT GGGAGATGAG CGATTATTTC 
ACTTCAGCAT CGAACACAAA TCAGCATCAA 
GGAACAGCCC AAGAAACGTC AACTTCATTA 
ATAGCAATAG TATACGGAAA AAATTGAGGC 
AGCCACTTTC TGAAGCCACT ACTGAATGAG 
GGAATTCAAA GCAATAACAA TATCAGAGTA 
TGTTTATTGA TAACCATTAT GATGACGATT 
GAAAGCTCAA GGTCAGCTGA CATCGAGGTA 
AGATTTTGGA GTCTTCGCTG CAGATTCTTG 
AAGAATATTT GGGCAAAATT GATTTTCAAA 
AATAATGACT ATCGTTTGAT GGGAACGACA 
TCCACATTGG CCTTTATATA TAGACTCTCG 
GATACATAAG ACGGAGTTGT GAGCCCAATG 
CCGCTTGACG AAAAACCAAG AGGAGATAAT 
TTTAAGGGCT ATAAGAGATA TTCGTAAGGG 
GGCAATGGGA TTTGAGAAAT CCTATTTGGG 
GATTTGGATT CCCTACCGGA TCCCGACAAG 
AAAGACTATT TTAACAAATT GTGATTGTGC 
ATTGTCCAAT AACTAAAATC AAAAACTTTT 
ACGAAGGAAT CCCTATCTAA TAAATCTTAC 
CTGTAAGCCA TAA 



FIGURE 39 (cont) . 
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FIGURE 40. YJL105W Protein Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide sequence of 
Saccharomyces cerevisiae chromosome X. 

Galibert, F. , Alexandraki, D. , Baur, A., Boles, E. , Chalwatzis, N . , 

Chuat, J. C, Coster, F. , Cziepluch, C, De Haan, M. , Domdey, H. , 

Durand, P., Entian, K. D., Gatius, M. , Goffeau, A., Grivell, L. A., et al . 



105W 


Length: 560 March 26 


;, 1999 16:52 Type: P 


Check: 103 


1 


MTSPESLSSR 


HIRQGRTYTT 


TDKVISRSSS 


YSSNSSMSKD 


YGDHTPLSVS 


51 


SAASETLPSP 


QYMPIRTFNT 


MPTAGPTPLH 


LFQNDRGIFN 


HHSSSGSSKT 


101 


ASTNKRGIAA AVALATAATI 


PFPLKKQNQD 


DNSKVSVTHN 


ESSKENKITP 


151 


SMRAEDNKPK 


NGCICGSSDS 


KDELFIQCNK 


CKTWQHKLCY 


AFKKSDPIKR 


201 


DFVCKRCDSD 


TKVQVNQVKP 


MIFPRKMGDE 


RLFQFSSIVT 


TSASNTNQHQ 


251 


QSVNNIEEQP 


KKRQLHYTAP 


TTENSNSIRK 


KLRQEKLWS 


SHFLKPLLNE 


301 


VSSSNDTEFK 


AITISEYKDK 


YVKMFIDNHY 


DDDWWCSNW 


ESSRSADIEV 


351 


RKSSNERDFG 


VFAADSCVKG 


ELIQEYLGKI 


DFQKNYQTDP 


NNDYRLMGTT 


401 


KPKVLFHPHW 


PLYIDSRETG 


GLTRYIRRSC 


EPNVELVTVR 


PLDEKPRGDN 


451 


DCRVKFVLRA 


IRDIRKGEEI 


SVEWQWDLRN 


PIWEIINASK 


DLDSLPDPDK 


501 


FWLMGSIKTI 


LTNCDCACGY 


LGHNCPITKI 


KNFSEEFMRN 


TKESLSNKSY 


551 


FNTIMHNCKP 
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FIGURE 41. YMR134w DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 

Symbols: 1 to: 1914 from: chrl3.gcg ck: 8335, 536637 to: 538550 

Chromosome XIII Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XIII. 

Bowman, S., Churcher, C, Badcock, K., Brown, D., Chillingworth, T., 
Connor, R. , Dedman, K. , Devlin, K., Gentles, S., Hamlin, N . , Hunt, S., . . . 

gcgseq.tmp. 31828 Length: 1914 March 26, 1999 16:58 Type: N Check: 3324 



1 


TACAATAACA 


51 


TCACAGAAGG 


101 


CAAACTCAAA 


151 


TGAAAGAGAG 


201 


TTCAAATAGC 


251 


TCTTCACAAG 


301 


TCCTGAGTCG 


351 


ACACTACCAA 


401 


ACCACTTCTG 


451 


TTTATTAGAA 


501 


GCGTCACTGG 


551 


GACAACACGG 


601 


ACTAACAAAC 


651 


AAGAAGTGCT 


701 


CCAAGCGAAG 


751 


AAGAAAAATA 


801 


TCATGAAATG 


851 


CTTATCCACT 


901 


TTTGCCACTA 


951 


CAATAAAATG 


1001 


TAGCCACGAA 


1051 


AAGCAAGAAG 


1101 


AATATAAAGC 


1151 


TGGAAGTGTT 


1201 


ATGTCGTTAA 


1251 


ACTACAGGAG 


1301 


GTAGGATAAC 


1351 


GAACTGTATA 


1401 


AGAACTAGAG 


1451 


TGGTGAGTGA 


1501 


ATTGTTATTG 


1551 


GTTTGTATCA 


1601 


AACTTCTGTC 


1651 


TATGAAAATG 


1701 


GCTGAGACTT 


1751 


TTAAGAATGA 


1801 


GTCAAAATTT 


1851 


TGTTCAATTA 


1901 


AATTCATACT 



AGCCAGGTGC 
TCCAAAAGTT 
ACATGATTAT 
TCAAATAATC 
T CAACAAAT A 
ACCTCAACAC 
CTAATTGTAC 
CTGCCTTCCT 
ATTTTACTTC 
ACTATTCATA 
TTTAGATGCA 
CGAAGAAAAA 
CCCCAAAAAT 
ACCTAACGGC 
GATTAACGAT 
TCTAAGCGGC 
GGTATGTATG 
CATTTTCTCT 
CAAATATGAA 
TTCGAAAAGT 
GTAAAATGGA 
AGCAAGCTCG 
ATTATAAAAA 
TATAGCAAAG 
AGGATAGGTA 
TTGCCATATG 
TCTCTTTTTG 
TAACGATCGA 
CGTGGAAGAA 
TGAATATCAC 
AAGAGTTCGA 
GAATTGGATG 
CACCCCATTA 
AAAAAAGACC 
TACCACTACT 
GGGCGGTGAA 
ACAATCATAA 
TGTTTCCTGA 
ATGA 



AAGGCAATAA 
AGTAGCTACA 
GGATTTCAGT 
GTTCAAATAT 
TGGCCAAACA 
TCCAATGGCT 
AGCCATTGGA 
AAT GCAGAAA 
AAGAAAGGAG 
TACCAAAGGA 
ACACCAACGA 
TACCAGTAAT 
CTGGAAATAC 
ACACTTAATG 
AAGAGTTAAA 
TAATCAAGGA 
CATTTGCAAG 
GACTTGACAA 
TGAAAAGGTT 
GAACCCTTTT 
AAAGTAAACC 
TTTAAGTAAG 
TTGAATCACA 
TGTGGTATAG 
TCTAAATCTC 
TTCATCAATT 
ATAGTGGTTG 
AATGAGTCTT 
TCGATGAAAG 
GGTAAAGAAT 
AGATCGCGAT 
TCGAATGTAA 
AAATTTCATG 
TGAGTTTGGT 
TTAAAGATTG 
GGGGCAAGAA 
AGATGAACTA 
AGATTGATAC 



TAACGGTACA 
CAAATCCGAA 
CAACGTTATC 
AACTTTACCA 
CGGATTTAAA 
ACGCAAACTG 
GGTTTCTCAA 
ACAAAAAGAA 
ATTGCTCTGT 
AAGGGAAAGT 
TTATATGGAG 
AAGAAAAATA 
ACATACCCCT 
AAACGAGGAA 
AACGTTAATC 
AAAGTTGAAA 
AAACTGAGCT 
AGAAATACTA 
AATAAGGTTG 
TTTGCAATTC 
CGAGTTTCGG 
CCTTTATGAA 
TCGCAAATCT 
AAAAAGAACC 
GAATTAAAAT 
TAT C CAT GAT 
GTACGCTTGC 
CTACAAAAGA 
TCTGAAGCTT 
ACAAAGACGA 
AAGTTTTTTG 
TGTTATTGTA 
TTGAATTTTC 
ACTACCTTGC 
C GAAAT AT AT 
AGTTTACGAT 
CTGCCATTGA 
GGGAAACACG 



AAGGTCTGTT 
CACGCAATTT 
AGGAAGAATC 
CACGACAGCA 
TGTAGTACAA 
TTTTGGGTCG 
TCTCCACCAA 
AAAAGTCGAC 
GTAAAACTGG 
CAGAT GCAAA 
CCCCGGGAAA 
TTGATGATAA 
GATAGAAATA 
AGAAGCATCG 
GGAATGCGTC 
GACGAAGAAT 
GTTTCCCCCT 
ACTAACAACT 
AAACGGTTCT 
CTTTTTACAC 
CAATATCGCT 
AAAAAAACAA 
GCAATATACT 
AAAGGCCGGT 
TAATAAATAA 
CGAATAAGTG 
ATTTTTTAAC 
ACACATCAGA 
CATCGGATGT 
GAAAAGCGGT 
CAAAACCTGT 
GATGGGAAAG 
TCCAGAGGAT 
GTGTATTGAG 
CGCGATATAA 
TTCCAACGGT 
ATATCGATGA 
ATAAAATGCG 
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FIGURE 42 . YMR134w Protein Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XIII. 

Bowman, S., Churcher, C, Badcock, K., Brown, D., Chillingworth, T . , 
Connor, R., Dedman, K., Devlin, K. , Gentles, S., Hamlin, N . , Hunt, S., 
Jagels, K., Lye, G., Moule, S., Odell, C., Pearson, D., Rajandream, et al. 

YMR134W Length: 237 March 26, 1999 16:59 Type: P Check: 2966 
1 MSLKDRYLNL ELKLINKLQE LPYVHQFIHD RISGRITLFL IWGTLAFFN 
51 ELYITIEMSL LQKNTSEELE RGRIDESLKL HRMLVSDEYH GKEYKDEKSG 
101 IVIEEFEDRD KFFAKPVFVS ELDVECNVIV DGKELLSTPL KFHVEFSPED 
151 YENEKRPEFG TTLRVLRLRL YHYFKDCEIY RDIIKNEGGE GARKFTISNG 
201 VKIYNHKDEL LPLNIDDVQL CFLKIDTGNT IKCEFIL 
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FIGURE 43. YER044c DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 



Symbols: 1 to: 1647 from: chr5.gcg /rev ck: 9036, 237569 to: 2392 
Chromosome V Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S., Mulligan, J., Hennessy, K. , Yelton, M. A., Allen, E. , 
Araujo, R. , Aviles, E. , Berno, A., Brennan, T., Carpenter, J., Chen, . . . 

gcgseq.tmp.2512 Length: 1647 March 26, 1999 16:38 Type: N Check: 8794 

1 AACACTCCAA ATCTTGTTAG TTTCTCATTA TTCGCATCGC ATAGATTCTG 

51 ATTCTTCTTT TAAGAGGACA CTGATAGACG TTCATGTTTT CAATTTCATC 

101 GCCAAGTTTC TGTTTAATAG AATTTTATTG AAGAAGAACC AAAACGATCC 

151 AAAATGGCTT CAAAACTTTT ACGACCAGGG AGATGGCAAA CATTTATGTG 

2 01 ATAAAGTTGA CTACAAGCGC TTGTGTTCGT TGCATTTTAC CCTTATTTAC 

251 TCTATTATTA ACATTCAACT CATCAAAATC AAGACAAACC AAACATTTGA 

301 ACCGCAGATA TTAAAATACG TATCTGTTCT GAAATTAATT GAACACATAC 

351 TTATCATCAT CGAAAGTCTG ATACATGTAC TTATTAGATT TGTATCGAAG 

4 01 CATAAACTAA TATGCATCAA CCGGAAAAAG GCGTACTGTC GAGTATACCT 
451 CGAAAGAGAA TTGAGTTTGA AGAAAACCTA CTTAAAGAAC TTTTACAGTG 

5 01 TAATAAGCGG TGTCCCAGAA AAAGAGTTAG GGGGTCTATT GAAAATACTC 
551 AAGATAGTTA TTCTATCATT GCTCGAGACA TTTGAAAGCA TTGAATGGCA 
601 GCACTTAAAA CCTTTCCTGG AAAAATTTCC GGCTCATGAA ATATCGCTTC 
651 AGAAGAAAAG GAAATATATA CAGGCGGCCT TATTAATTAC TGCCGAAAGA 

7 01 AATTTGATAG CGCGCTTTCG ATTGTCAAGA TGGTTCAATG AGACAGAAAA 
751 CATTTAATTT TTCTTTTGCA GTAGGAGGCG CATTATAAAA CACAAAAATA 

8 01 TCGAAAGCTC TTTCATTTCG GGGACAACAA CTTCAGTTGA AAATTACAGT 
8 51 GAACACAACA TCTTCCCCAA CAGACCTACA TTAAAACGCT TCTTCCGGAC 
901 TTGCCCATGA TTAACCTAAT CTTATACGAA CTGAATTAAA CTTTACGGTA 
951 TTACCGATAG GAAACTTCTA TTTTATGATT TTTTCGTTCG GGGACGGAAC 

1001 GAACAGGAAA CAAAAAAAAA GGTACGATCC ATTGTATTCT CTACCCCCGT 

1051 ATATAAAACT AAGCTGAACA AGCCTGTTGT TTTGCTTTAC TATTGCTACT 

1101 ATTTTTGACG TAAACGCATT GACTAATTTC AGGTTTTTAT ATTCTTGACA 

1151 CTAGCTAGAC CATAGTATCG AAGGATT CAA ATACACTAAA GTATCAGATA 

12 01 ATGTTCAGCC TACAAGACGT AATAACTACA AC CAAGAC CA CCTTGGCAGC 

1251 AATGCCAAAA GGTTACTTAC CAAAATGGTT ACTTTTCATT TCCATTGTAT 

1301 CAGTCTTCAA TTCTATCCAG ACTTACGTTT CTGGTTTAGA ATTGACACGT 

1351 AAAGTCTACG AAAGAAAACC CACTGAAACA ACCCATTTGA GTGCAAGAAC 

14 01 TTTCGGTACT TGGACCTTTA TTTCCTGTGT TATCAGATTC TATGGGGCTA 

1451 TGTACTTGAA TGAACCACAC ATTTTCGAAT TGGTCTTCAT GTCTTATATG 

1501 GTTGCCCTAT TCCACTTCGG CTCTGAATTA TTGATCTTTA GAACTTGTAA 

1551 GTTGGGAAAG GGATTCATGG GTCCATTGGT TGTCTCAACC ACCTCTTTGG 

1601 TTTGGATGTA CAAACAAAGA GAATACTACA CTGGTGTTGC TTGGTAA 
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FIGURE 44. YKR044c Protein Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S., Mulligan, J., Hennessy, K. , Yelton, M. A., Mien, E. , 

Araujo, R. , Aviles, E . , Berno, A., Brennan, T., Carpenter, J., Chen, 

E . , Cherry, J. M. , Chung, E . , Duncan, M. , Guzman, E. , Hartzell, G. , et al . 

YER044C Length: 148 March 26, 1999 16:40 Type: P Check: 3540 

1 MFSLQDVITT TKTTLAAMPK GYLPKWLLFI SIVSVFNSIQ TYVSGLELTR 

51 KVYERKPTET THLSARTFGT WTFISCVIRF YGAMYLNEPH I FELVFMSYM 

101 VALFHFGSEL LIFRTCKLGK GFMGPLWST TSLVWMYKQR EYYTGVAW 
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FIGURE 45. Mouse EST with Similarity to YER044c 



LOCUS 

DEFINITION 



ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



the 



FEATURES 

source 



AI386195 455 bp mRNA EST 27-JAN-1999 

mq60h05.yl Soares 2NbMT Mus musculus cDNA clone IMAGE: 583161 5' 

similar to SW: YEN4_YEAST P40030 HYPOTHETICAL 17.1 KD PROTEIN IN 

SAH1-MEI 4 INTERGENIC REGION. ; , mRNA sequence. 

AI386195 

g4199658 

EST. 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 455) 

Marra,M. , Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 

Underwood, K. , Steptoe,M., Theising,B., Allen, M., Bowers , Y. , 

Person, B., Swaller,T., Gibbons, M. , Pape,D., Harvey, N., Schurk,R. 

Ritter,E., Kohn,S., Shin,T., Jackson, Y . , Cardenas, M. , McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 1999 

Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 357809 

This read is a RESEQUENCE of a previously sequenced mouse clone 
This read has been verified (found to hit its original self in 



correct orientation) 

Seq primer: -40RP from Gibco 

High quality sequence stop: 455. 

Location/Qualifiers 

1. .455 

/organism="Mus musculus" 
/strain="C57BL/6J" 
/note="Vector: pT7T3D-Pac 
polylinker; Site_l: Not I; 



cDNA 



(Pharmacia) with 
Site 2: Eco RI; 



a modified 
1st strand 



was primed with a Not I - oligo(dT) primer [5' 
TGTTACCAATCTGAAGTGGGAGCGGCCGCGTTTTTTTTTTTTTTTTTTTTTTTTT 
3 ' ] ; double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia) , digested with Not I and cloned into the Not 



two 



and Eco RI sites of the modified pT7T3 vector. RNA 
provided by Dr. Bertrand Jordan. Library went through 

rounds of normalization, and was constructed by Bento 

Soares and M.Fatima Bonaldo." 

/ db_xre f = " taxon : 1 00 9 0 " 

/clone="IMAGE: 583161" 

/clone_lib="Soares 2NbMT" 

/sex="male" 

/tissue_type="Thymus" 

/dev_stage="4 weeks" 

/lab host="DH10B" 
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BASE COUNT 
ORIGIN 



94 a 



131 c 



112 g 



117 t 



1 others 



1 tgcggatgct gctgatactg ctgcagtagt actggatcgt caggcagagc gccctctctt 

61 ggaggggagt catgagccgc ttcctgaatg tgttacgaag ctggctggtt atggtgtcca 

121 ttatagccat ggggaacaca ctccagagct tccgagacca cacttttctc tacgagaagc 

181 tctacactgg caagccaaac cttgtgaatg gcctccaagc ccggaccttt gggatctgga 

241 cgctgctctc atcagtgatt cgctgcctct gtgccattga catccacaac aaaacactct 

301 atcacatcac actgtggaca ttcctcctcg ccctgngaca cttcctctca gagttgtttg 

361 tatttggaac agcagctccc acagttggtg tgctggcacc cctgatggta gcaagtttct 

421 caatcctggg catgctggtc gggctcccgt accta 



// 



FIGURE 45 



(cont) . 
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FIGURE 46. Human EST with Similarity to YER044c 



LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



cloned 



sublibrary" 



W28235 839 bp iriRNA EST 08-MAY-1996 

43h8 Human retina cDNA randomly primed sublibrary Homo sapiens 

cDNA, mRNA sequence. 

W28235 

gl308183 

EST. 

human. 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 839) 

Macke, J., Smallwood, P. and Nathans, J. 
Adult Human Retina cDNA 
Unpublished (1996) 

Contact: Dr. Jeremy Nathans 

Dr. Jeremy Nathans, Dept. of Molecular Biology and Genetics 

Johns Hopkins School of Medicine 

725 North Wolfe Street, Baltimore, MD 21205 

Tel: 410 955 4678 

Fax: 410 614 0827 

Email : jeremy_nathans@qmail.bs . jhu.edu 
Clones from this library are NOT available. 
PCR PRimers 

FORWARD : CTTTTGAGCAAGTTCAGCCTGGTTAAGT 
BACKWARD : GAGGTGGCTTATGAGTATTTCTTCCAGGGTAA 
Seq primer: GGGTAAAAAGCAAAAGAATT . 

Location/ Qualifiers 

1. . 839 

/organism="Homo sapiens" 

/note=*'Organ: eye; Vector: lambda gtlO; Site_l: EcoRI; 
Site_2: EcoRI; The library used for sequencing was a 
sublibrary derived from a human retina cDNA library. 
Inserts from retina cDNA library DNA were isolated, 
randomly primed, PCR amplified, size-selected, and 

into lambda gtlO. Individual plaques were arrayed and 
used as templates for PCR amplification, and these PCR 
products were used for sequencing." 
/db_xref="taxon: 9606" 

/clone_lib="Human retina cDNA randomly primed 

/sex="mixed (males and females)" 
/tissue_type="retina" 
/dev_stage="adult" 
/lab_host="E. coli strain K802" 

141 c 136 g 140 t 295 others 



127 a 



BASE COUNT 
ORIGIN 

1 gnnnnnngnn nnnnnnnnnt tnttgagnac cgcagtngca gcagcagcag ccgctgncgc 

61 aaacaagccc tcccacgttt gaggggagtc atgagccgtt tcctgaatgt gttaagaagt 

121 tggctggtta tggtgtccat catagccatg gggaacacgc tgcagagctt ccgagaccac 

181 acttttctct atgaaaagct ctacactggc aagccaaacc ttgtgaatgg cctccaagct 

241 cggacctttg ggatctggac gctgctctca tcagtgattc gctgcctctg tgccattgac 

301 attcacaaca agacgctcta tcacatcaca ctctggacct tcctccttgc cctggggcat 

361 ttcctctctg agttgtttgt cttatggaac tgcagctccc acgattggng tcctggcanc 

421 cctgatggtg gnaagtttct ccatcctggg tattgtggtc ggctccngta ttttagaagt 

481 agaaccagtt ccagacagaa gaagagaact gaggcagaat atcaacccca gggtggatca 

541 antgggttac aagtggttna aaannnnnnn nnnnnnnnnc nnnntnntnt naannnnnnn 

601 nnnnnnnnnn nnnnnnnnna nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
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661 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
721 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 
781 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnc 

// 

FIGURE 46 (cont) . 
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FIGURE 47. Rat EST with Similarity to YER044c 



LOCUS 

DEFINITION 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
COMMENT 



the 

normalized 
through 



FEATURES 

source 



allows 
within 



was 



AI172515 475 bp mRNA EST ll-FEB-1999 

UI-R-C2p-nu-d-02-0-UI.sl UI-R-C2p Rattus norvegicus cDNA clone 

UI-R-C2p-nu-d-02-0-UI 3', mRNA sequence. 

AI172515 

g3712555 

EST. 

Norway rat. 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 
1 (bases 1 to 475) 

Bonaldo,M. F. , Lennon,G. and Soares,M.B. 

Normalization and subtraction: two approaches to facilitate gene 
discovery 

Genome Res. 6 (9), 791-806 (1996) 
97044477 

Contact: Soares, MB 

Program for Rat Gene Discovery and Mapping 
University of Iowa 

451 Eckstein Medical Research Building Iowa City, IA 52242, USA 
Tel: 319 335 8250 
Fax: 319 335 9565 

Email : msoaresSblue . weeg . uiowa . edu 

The sequence tag present in the cDNA between the NotI site and 

oligo-dT track served to identify it as a clone from the 

adult Placenta library. cDNA Library Preparation: M. Fatima 
Bonaldo, Ph.D. Clone distribution: clones will be available 

Research Genetics 

Seq primer: M13 Forward. 

Location/ Qualifiers 

1. . 475 

/organism="Rattus norvegicus" 
/strain^" Sprague-Dawley" 

/note="Vector: pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2 : Eco RI; The UI-R-C2p 
library is a subtracted library derived from the UI-R-C1 
library, which is a subtracted library derived from the 
UI-R-C0 library. The UI-R-C0 library consisted of a 
mixture of individually tagged normalized libraries 
constructed from rat placenta, adult lung, brain, liver, 
kidney, heart, spleen, ovary, muscle, 8, 12 and 18-day 
embryo. The tag is a string of 3-5 nucleotides present 
between the Not I site and the oligo-dT track which 

identification of the library of origin of a clone 

the mixture. The subtracted library (UI-R-C2p) was 
constructed as follows: PCR amplified cDNA inserts from 
UI-R-C1 clones from which 3' ESTs had been derived was 
used as a driver in a hybridization with the UI-R-C1 
library in the form of single-stranded circles. The 
remaining single-stranded circles (subtracted library) 

purified by hydroxyapatite column chromatography, 
converted to double-stranded circles and electroporated 
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into DH10B bacteria (Life Technologies) to generate the 
UI-R-C2p library. This procedure has been previously 
described (Bonaldo, Lennon and Soares, Genome Research 

6: 

791-806, 1996)" 
/db_xref="taxon: 10116" 
/clone="UI-R-C2p-nu-d-02-0-UI" 
/clone_lib="UI-R-C2p" 
/dev_stage="adult" 

/lab_host="DH10B (Life Technologies)" 
BASE COUNT 115 a 112 c 126 g 119 t 3 others 

ORIGIN 

1 tttttttttt tttttttctg tctggatact ggttctgctt ctaggtaccg gagcccaact 
61 agcataccca ggattgagaa acttgctacc atcaagggtg ccagcacacc aactgtggga 
121 gccgctgttc caaatacaaa caactccgag aggaagtgtc ccagggcaag gaggaatgtc 
181 cacagtgtga tgtgatagag tgttttgttg tggatgtcaa tggcacagag gcagcgaatc 
241 actgaagaga gcagcgtcca gatcccaaag gtccgggctt ggaggccatt cacaaggttt 
301 ggtttgccag tgtanagctt ttcatanaga aaagtgtggt ctcggaagct ctggagcgtg 
361 ttncccatgg ctatgatgga caccataacc agccagcttc gtagcacatt caggaagcgg 
421 ctcatgactc ccctcaaaga gagggcgctc tgcctgaccc tcgtgccgaa ttctt 

// 

FIGURE 47 (cont) 
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FIGURE 48. YLRlOOw DNA Sequence 



Sequence contains 800bp of 5' promoter sequence. 

Symbols: 1 to: 1844 from: chrl2.gcg ck: 2436, 341011 to: 342854 

Chromosome XII Sequence 

Nature 387:87-90 [97313267] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XII. 

Johnston, M. , Hillier, L . , Riles, L., Albermann, K., Andre, B., 
Ansorge, W., Benes, V., Bruckner, M., Delius, H., Dubois, E. , . . . 

gcgseq.tmp. 10136 Length: 1844 March 26, 1999 15:19 Type: N Check: 2071 



1 ACGTACAAAA AAGAGCACGC TGCTTTATTT ATACTTTTGT GCCACAAGAA 

51 TGATCAACAT CAACATAAAT ATCAACTAGT ATCTGCAACA CATCTGCTCC 

101 ACGGAACTAA ACCCGTTGAG CAGTGCCCCG TGGAAACGTA AACTATCGCA 

151 AATTGGGATT AACAAGCCAA AAACAGCCAA GCAAGATTCA CGAAACCGCG 

201 CCTCGTTTGG ACCCCGAAGG CCCATTTAAC GGCCGGCCGT TACAAGCAAG 

251 ATCGGCAGAG CAAACCACTC CCCAGCACCA CAGCACATCA CTGCACGAGC 

301 AACAATAACT AGAACAT GGC AGATAGCGAG GATACCTCTG TGATCCTGCA 

351 GGGCATCGAC ACAATCAACA GCGTGGAGGG CCTGGAAGAA GATGGTTACC 

4 01 TCAGCGACGA GGACACGTCA CTCAGCAACG AGCTCGCAGA TGCACAGCGT 

451 CAATGGGAAG AGTCGCTGCA ACAGTTGAAC AAGCTGCTCA ACTGGGTCCT 

501 GCTGCCCCTG CTGGGCAAGT ATATAGGTAG GAGAATGGCC AAGACTCTAT 

551 GGAGTAGGTT CATTGAACAC TTTGTATAAG TGTTTGTTGT TTATGTATCC 

601 GCATATAGCA GTTATAACAG ATAAATGGCA CTTTTCGCAC ACCCGTTGTT 

651 TTATCTCCGA TAGTACGTGG GCCTTTATTT ATGGTCGTTT AAC GAAAGAA 

701 CGGCATCTTG AATTGAGCAG GTATTTAAAA GATAGGACGA GAAACAAGCA 

751 CATGATCTGT GTCGAAAAAA AGTAGCAAAG AGAAAAAGTA GGAGGATAGG 

801 ATGAACAGGA AAGTAGCTAT CGTAACGGGT ACTAATAGTA ATCTTGGTCT 

851 GAACATTGTG TTCCGTCTGA TTGAAACTGA GGACACCAAT GTCAGATTGA 

901 CCATTGTGGT GACTTCTAGA ACGCTTCCTC GAGTGCAGGA GGTGATTAAC 

951 CAGATTAAAG ATTTTTACAA GAAATCAGGC CGTGTAGAGG ATTTGGAAAT 

1001 AGACTTTGAT TATCTGTTGG TGGACTTCAC CAACATGGTG AGTGTCTTGA 

1051 ACGCATATTA CGACATCAAC AAAAAGTACA GGGC GAT AAA CTACCTTTTC 

1101 GTGAATGCTG CGCAAGGTAT CTTTGACGGT ATAGATTGGA TCGGAGCGGT 

1151 CAAGGAGGTT TTCACCAATC CATTGGAGGC AGTGACAAAT CCGACATACA 

12 01 AGATACAACT GGTGGGCGTC AAGTCTAAAG ATGACATGGG GCTTATTTTC 

1251 CAGGCCAATG TGTTTGGTCC GTACTACTTT ATCAGTAAAA TTCTGCCTCA 

1301 ATTGACCAGG GGAAAGGCTT ATATTGTTTG GATTTCGAGT ATTATGTCCG 

1351 ATCCTAAGTA TCTTTCGTTG AACGATATTG AACTACTAAA GACAAATGCC 

14 01 TCTTATGAGG GCTCCAAGCG TTTAGTTGAT TTACTGCATT TGGCCACCTA 

14 51 CAAAGACTTG AAAAAGCTGG GCATAAATCA GTATGTAGTT CAACCGGGCA 

1501 TATTTACAAG CCATTCCTTC TCCGAATATT TGAATTTTTT CACCTATTTC 

1551 GGCATGCTAT GCTTGTTCTA TTTGGCCAGG CTGTTGGGGT CTCCATGGCA 

1601 CAATATTGAT GGTTATAAAG CTGCCAATGC CCCAGTATAC GTAACTAGAT 

1651 TGGCCAATCC AAACTTTGAG AAACAAGACG TAAAATACGG TTCTGCTACC 

17 01 TCTAGGGATG GTATGCCATA TATCAAGACG CAGGAAATAG ACCCTACTGG 
1751 AATGTCTGAT GTCTTCGCTT ATATACAGAA GAAGAAACTG GAATGGGACG 

18 01 AGAAACT GAA AGATCAAATT GTTGAAACTA GAACCCCCAT TTAA 
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FIGURE 49. YLRlOOw Protein Sequence 

Nature 387:87-90 [97313267] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XII. 

Johnston, M. , Hillier, L. , Riles, L., Albermann, K. , Andre, B., 

Ansorge, W., Benes, V., Bruckner, M. , Delius, H . , Dubois, E . , 

Dusterhoft, A., Entian, K. D . , Floeth, M. , Goffeau, A., Hebling, U . , et al . 

YLR100W Length: 347 March 26, 1999 15:20 Type: P Check: 2853 .. 



1 


MNRKVAIVTG 


TNSNLGLNIV 


FRLIETEDTN 


VRLTIWTSR 


TLPRVQEVIN 


51 


QIKDFYNKSG 


RVEDLEIDFD 


YLLVDFTNMV 


SVLNAYYDIN 


KKYRAINYLF 


101 


VNAA.QGIFDG 


IDWIGAVKEV 


FTNPLEAVTN 


PTYKIQLVGV 


KSKDDMGLIF 


151 


QANVFGPYYF 


ISKILPQLTR 


GKAYIVWISS 


IMSDPKYLSL 


NDIELLKTNA 


201 


SYEGSKRLVD 


LLHLATYKDL 


KKLGINQYW QPGIFTSHSF 


SEYLNFFTYF 


251 


GMLCLFYLAR 


LLGSPWHNID 


GYKAANAPVY 


VTRLANPNFE 


KQDVKYGSAT 


301 


SRDGMPYIKT 


QEIDPTGMSD 


VFAYIQKKKL 


EWDEKLKDQI 


VETRTPI 
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FIGURE 50. Human EST with Similarity to YZRIOOw 



LOCUS 

DEFINITION 
clone 

ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



Chordata; Craniata; Vertebrata; Mammalia; 
Catarrhini; Hominidae; Homo. 



R92053 454 bp mRNA EST 25-AUG-1995 

yp96c01.rl Soares fetal liver spleen 1NFLS Homo sapiens cDNA 

IMAGE: 1952 64 5', mRNA sequence. 
R92053 
g959593 
EST. 
human . 

Homo sapiens 
Eukaryota; Metazoa; 
Eutheria; Primates; 
1 (bases 1 to 454) 

Hillier,L., Clark, N., Dubuque, T., Elliston,K., Hawkins, M. , 
Holman,M., Hultman,M., Kucaba,T., Le,M. , Lennon,G., Marra,M., 
Parsons , J. , Rifkin,L., Rohlfing,T., Soares , M. , Tan,F., 
Trevaskis, E. , Waterston, R. , Williamson, A. , Wohldmann,P. and 
Wilson, R. 

The WashU-Merck EST Project 
Unpublished (1995) 

Contact: Wilson RK 

Washington University School of Medicine 

4444 Forest Park Parkway, Box B501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: est@watson.wustl.edu 

Insert Size: 1067 

High quality sequence stops: 337 

Source: IMAGE Consortium, LLNL 

This clone is available royalty-free through LLNL ; contact the 



FEATURES 

source 



(Pharmacia) 
RI; 

primer 



BASE COUNT 



IMAGE Consortium (info@image.llnl.gov 
Insert Length: 1067 Std Error: 0.00 
Seq primer: M13RP1 
High quality sequence stop: 337. 

Location/Qualifiers 

1. .454 

/organism="Homo sapiens" 
/note="Organ: Liver and Spleen 



for further information. 



Vector: pT7T3D 



115 



with a modified polylinker; Site_l: Pac I; Site_2: Eco 

1st strand cDNA was primed with a Pac I - oligo(dT) 

[ 5 ' AACTGGAAGAATTAATTAAAGATCTTTTTTTTTTTTTTTTTTT 3 ' ] , 
double-stranded cDNA was ligated to Eco RI adaptors 
(Pharmacia), digested with Pac I and cloned into the Pac 

and Eco RI sites of the modified pT7T3 vector. Library 
went through one round of normalization. Library 
constructed by Bento Soares and M.Fatima Bonaldo." 
/db_xref="GDB: 3764314" 
/db_xref="taxon: 9606" 
/clone=" IMAGE: 195264" 

/clone_lib=" Soares fetal liver spleen 1NFLS " 
/sex="male" 

/dev stage="20 week-post conception fetus" 
/lab~host="DH10B (ampicillin resistant)" 
a 111 c 96 g 129 t 3 others 
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ORIGIN 

1 
61 
121 
181 
241 
301 
361 
421 

// 



FIGURE 50 



tttgagacca 
cacagtgaca 
ttcagcctcg 
gccactgacc 
aatgtggcct 
atatggacgc 
actttggaca 
gctggaatcn 



(cont) . 



atgtctttgg 
atccatctca 
aggacttcca 
ttttgagtgt 
gtccaggtac 
tgttggatgc 
ccatataatg 
tttcaatcct 



ccattttatc 
gctcatctgg 
gcacagcaaa 
ggctttgaac 
agcattgacc 
cggcaatatt 
ggaacaggaa 
ctggatccaa 



ctgattcggg 
acatcatctc 
ggcaaggaac 
aggaacttca 
aatttgacat 
gctacttcgc 
gntatgggta 
atat 



aactggagcc 
gcagtgcaag 
cctacagctc 
accagcaggg 
atggaattct 
ttttttggca 
tgggnttttc 



tctcctctgt 
gaaatctaat 
ttccaaatat 
tctctattcc 
gcctccgttt 
aatggcattc 
ccaccaaaag 
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FIGURE 51. Mouse EST with Similarity to YLRlOOw 

LOCUS AI226514 1039 bp mRNA EST 29-OCT-1998 

DEFINITION uj07d08.yl Sugano mouse liver mlia Mus musculus cDNA clone 

IMAGE: 1891215 5' similar to TR:Q62904 Q62904 OVARIAN- SPECIFIC 
PROTEIN. ;, mRNA sequence. 
ACCESSION AI226514 
NID g3809567 
KEYWORDS EST. 
SOURCE house mouse. 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 1039) 

AUTHORS Marra,M., Hillier,L., Allen, M. , Bowles, M., Dietrich, N., 
Dubuque, T. , 

Geisel,S., Kucaba,T., Lacy,M., Le,M. , Martin, J., Morris, M. , 
Schellenberg, K. , Steptoe,M., Tan,F., Underwood, K. , Moore, B., 
Theising,B., Wylie,T., Lennon,G., Soares,B., Wilson, R. and 
Waterston, R. 
TITLE The WashU-HHMI Mouse EST Project 

JOURNAL Unpublished (1996) 
COMMENT 

Contact: Marra M/Mouse EST Project 

WashU-HHMI Mouse EST Project 

Washington University School of MedicineP 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 975539 

Seq primer: custom primer used 
High quality sequence stop: 509. 
FEATURES Location/Qualifiers 
source 1 . . 1039 

/organism="Mus musculus" 
/strain="C57BL" 

/note="Organ: liver; Vector: pME18S-FL3; Site_l: Drain 
(CACTGTGTG); Site_2 : Drain (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double-stranded cDNA was 
ligated to a Drain adaptor [TGTTGGCCTACTGG] , digested 
and cloned into distinct Dralll sites of the pME18S-FL3 
vector (5' site CACTGTGTG, 3' site CACCATGTG). Xhol 

should 

be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5' end primer CTTCTGCTCTAAAA.GCTGCG and 3' 

end 

primer CGACCTGCAGCTCGAGCACA. " 

/db_xref="taxon: 10090" 

/clone="IMAGE: 1891215" 

/clone_lib=" Sugano mouse liver mlia" 

/sex="female" 

/dev_stage="adult" 

/lab_host="DH10B" 
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BASE COUNT 245 a 267 c 251 g 272 t 4 others 
ORIGIN 

1 ggctaagaga accccggtgc agttctactt cggtgcaggg cgtggaagat gcggaaggtg 

61 gttttgatca ccggggcgag cagtggcatt gggctagccc tttgcggtcg actgctggca 

121 gaagacgatg acctccacct gtgtttggcg tgtaggaacc tgagcaaagc aagagctgtt 

181 cgagataccc tgctggcctc tcacccctcc gccgaagtca gcatcgtgca gatggatgtc 

241 agcagcctgc agtcggtggt ccggggtgca gaggaagtca agcaaaagtt tcaaagatta 

301 gactacttat atctgaatgc tggaatcctg cctaatccac aattcaacct caaggcattt 

361 ttctgcggca tcttttcaag aaatgtgatt catatgttca ccacagcgga aggaattttg 

421 acccagaatg actcggtcac tgccgacggg ttgcaggagg tgtttgaaac caatctcttt 

481 ggccacttta ttctgattcg ggaactggaa ccacttctct gccatgccga caacccctct 

541 cagctcatct ggacgtcctc tcgcaatgca aagaaggcta acttcagcct ggaggacata 

601 cagcacttca aaggcccgga accctacagc tctttccaat atgctaccga cctcctgaat 

661 gtggctntga acagggaatt caaaccagaa ggtctggtat tcagtggtga ttgccgaggg 

721 cgtctgatga ccaatatgac gtatggaaat ttgccttcct ttatcctgac cgtggttcta 

781 cccttaagtg ggctccttcg cttttttgaa aatgccctca cctgggaccc cgtaccactg 

841 atcaaaagct ctgggtgtgt ttctttcaca tataaccgga ggcttttatt ctttgaccaa 

901 atacgcgagc tccaccttgg tagtgggact atataccgac cggtcccacg aatgcactca 

961 tttaacacct tgtcaaaact ttttatagtt ttacctgttg tgataacgtg gtgntacccc 
1021 cttcgtantt gnaataccc 

// 



FIGURE 51 (cont) . 
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FIGURE 52. Mouse EST with Similarity to YLRlOOw 



LOCUS 

DEFINITION 



ACCESSION 
NID 

KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
COMMENT 



FEATURES 

source 



should 



end 



AI528381 837 bp mRNA EST 18-MAR-1999 

ui96g06.yl Sugano mouse liver mlia Mus musculus cDNA clone 

IMAGE: 1B90298 5' similar to TR:Q62904 Q62904 OVARIAN- SPECIFIC 

PROTEIN. ;, mRNA sequence. 

AI528381 

g4442516 

EST. 

house mouse. 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Mammalia; 
Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 837) 

Marra,M., Hillier,L., Kucaba,T., Martin, J., Beck,C, Wylie,T., 

Underwood, K. , Steptoe,M., Theising,B., Allen, M. , Bowers, Y., 

Person, B . , Swaller,T., Gibbons , M. , Pape,D., Harvey, N., Schurk,R., 

Ritter,E., Kohn,S., Shin,T., Jackson, Y., Cardenas, M. , McCann,R., 

Waterston,R. and Wilson, R. 

The WashU-NCI Mouse EST Project 19 99 

Unpublished (1999) 

Contact: Marra M/WashU-NCI Mouse EST Project 1999 
Washington University School of Medicine 

4444 Forest Park Parkway, Box 8501, St. Louis, MO 63108, USA 

Tel: 314 286 1800 

Fax: 314 286 1810 

Email: mouseest@watson.wustl.edu 

This clone is available royalty-free through LLNL ; contact the 
IMAGE Consortium (info@image.llnl.gov) for further information. 
MGI: 974622 

Possible reversed clone: similarity on wrong strand 
Seq primer: custom primer used 
High quality sequence stop: 429. 
Location/Qualifiers 
1. .837 

/organism="Mus musculus" 
/strain="C57BL" 

/note="0rgan: liver; Vector: pME18S-FL3; Site_l: Dralll 
(CACTGTGTG); Site_2 : Dralll (CACCATGTG) ; 1st strand cDNA 
was primed with an oligo(dT) primer 

[ATGTGGCCTTTTTTTTTTTTTTTTT] ; double-stranded cDNA was 
ligated to a Dralll adaptor [TGTTGGCCTACTGG] , digested 
and cloned into distinct Dralll sites of the pME18S-FL3 
vector (5' site CACTGTGTG, 3' site CACCATGTG). Xhol 

be used to isolate the cDNA insert. Size selection was 
performed to exclude fragments <1.5kb. Library 
constructed by Dr. Sumio Sugano (University of Tokyo 
Institute of Medical Science) . Custom primers for 
sequencing: 5' end primer CTTCTGCT CTAAAA.GCTGCG and 3' 

primer CGACCTGCAGCTCGAGCACA. " 
/ db_x ref="taxon:10090" 
/clone="IMAGE:1890298" 
/clone_lib="Sugano mouse liver mlia" 
/ sex="f emale" 
/dev_stage="adult" 
/lab host="DH10B" 
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BASE COUNT 191 a 222 c 212 g 208 t 4 others 

ORIGIN 

1 ggctaagaga accccggtgc agttctactt cggtgcaggg cgtggaagat gcggaaggtg 
61 gttttgatca ccggggcgag cagtggcatt gggctagccc tttgcggtcg actgctggca 
121 gaagacgatg acctccacct gtgtttggcg tgtaggaacc tgagcaaagc aagagctgtt 
181 cgagataccc tgctggcctc tcacccctcc gccgaagtca gcatcgtgca gatggatgtc 
241 agcagcctgc agtcggtggt ccggggtgca gaggaagtca agcaaaagtt tcaaagatta 
301 gactacttat atctgaatgc tggaatcctg cctaatccac aattcaacct caaggcattt 
361 ttctgcggca tcttttcaag aaatgtgatt catatgttca ccacagcgga aggaattttg 
421 acccagaatg actcggtcac tgccgaccgg ttgcaggagg tgtttgaaac caatctctct 
481 tgccacttta ttctgattcg ggaactggaa ccacttctct tgcatgcgga caacccctct 
541 cagctcatct ggacgtcctc tcgcaatgca nagaaggcta acttcagcct ggaggacatn 
601 cagcactcca tagggcccgg accctacagc tctttccaat atgctaccga cctcctgaat 
661 gtggctttga acangaatnt caaccagaag ggtctgtatt ccagtcgcat gtgcccaggc 
721 gtcgtgatga ccaatatgac gtatggaatc ttgcctccct tttatctgga cgtgctccta 
781 cccatgatgg tgctccttcg cttctttggt aatgcgctta ctgggacacc gtacaac 

// 



FIGURE 52 (cont) . 
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FIGURE 53. Mouse Gene with Similarity to YLRlOOw 



LOCUS 3319971 334 aa 14-JUL-1998 

DEFINITION 17-beta-hydroxysteroid dehydrogenase type 7. 
ACCESSION 3319971 
PID g3319971 

DBSOURCE EMBL: locus MMY15733, accession Y15733 
KEYWORDS 

SOURCE house mouse. 

ORGANISM Mus mus cuius 

Eukaryota; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; 
Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
REFERENCE 1 (residues 1 to 334) 

AUTHORS Nokelainen, P. , Peltoketo,H. , Vihko,R. andVihko,P. 
TITLE Expression cloning of a novel estrogenic mouse 17 

beta-hydroxysteroid dehydrogenase/ 17-ketosteroid reductase 
(ml7HSD7), previously described as a prolactin receptor- 
associated protein (PRAP) in rat 
JOURNAL Mol. Endocrinol. 12 (7), 1048-1059 (1998) 
MEDLINE 98322544 
REFERENCE 2 (residues 1 to 334) 
AUTHORS Nokelainen, P. A. 
TITLE Direct Submission 

JOURNAL Submitted ( 27-NOV- 1997 ) P. A. Nokelainen, University of Oulu, 
Biocenter Oulu, WHO Collaborating Centre for Research on 
Reproductive Health Department of Clinical Chemistry, Kajaanintie 
50, FIN-90220 Oulu, FINLAND 
FEATURES Location/Qualifiers 
source 1 . . 334 

/organism="Mus musculus" 
/strain="BALB/c" 
/db_xref="taxon: 10090" 
/tissue_type="mammary gland" 

/cell_type="epithelial cell derived from mammary gland 
a pregnant mouse" 

/clone_lib="cDNA library prepared from poly (A) -enriched 
RNA isolated from HC11 cell line" 
/clone="ml7HSD7 . 1" 
/clone="ml7HSD7.2" 

Protein 1. .334 

/product="17-beta-hydroxysteroid dehydrogenase type 7" 

CDS 1..334 

/gene="HSD17B7" 
/db_xref="SPTREMBL:0887 36" 
/coded by="Y15733:64. .1068" 



of 



ORIGIN 



1 mrkwlitga ssgiglalcg rllaedddlh lclacrnlsk aravrdtlla shpsaevsiv 
61 qmdvsslqsv vrgaeevkqk fqrldylyln agilpnpqfn lkaffcgifs rnvihmftta 
121 egiltqndsv tadglqevfe tnlfghfili relepllcha dnpsqliwts srnakkanfs 
181 lediqhskgp epyssskyat dllnvalnrn fnqkglyssv mcpgvvmtnm tygilppfiw 
241 tlllpimwll rffvnaltvt pyngaealvw lfhqkpesln pltkyasats gfgtnyvtgq 
301 kmdidedtae kfyevllele krvrttvqks dhps 



// 
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FIGURE 54. YER034\r DNA Sequence 



Sequence contains 559bp of 5' promoter sequence. 

Symbols: 1 to: 1117 from: chr5.gcg ck: 9036, 221286 to: 222402 

Chromosome V Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S., Mulligan, J., Hennessy, K. , Yelton, M. A., Allen, E . , 
Araujo, R. , Aviles, E. , Berno, A., Brennan, T., Carpenter, J., Chen, . . . 

gcgseq.tmp.6597 Length: 1117 March 26, 1999 16:54 Type: N Check: 5026 . 



1 T GAT GAAAT A TTCCAGTTAT GCGTGTGCGT CTTGTGATGC AGATCCTTTT 

51 GGGCAAAAAC AGTTGGTTTG TGCGAAAACG CAAGGTAATA AATAGGCTTA 

101 AAGGAACTAA AAAAAAAAAA AGGAAAATAA CCAGCTAAGA TTTAAGGTAC 

151 AAGAAAGCGG TTGCACCTCA AGTAATGATA GTTATTAAAC CTTGGATTGG 

2 01 ACCAGATGTT TAAAATTGTT TTCAATAGTA GATTTGCAGT CGTAAATGCG 

2 51 TTCTCAGCAA TATCATATTG TGTTTATGAA GTATTACCAA AC GGGTAGAA 

301 GAACGGTTTA AGAGAATATG TCCGGATAAA GCGATCAGGA GAAAAGCTTA 

351 AAACCCAAAG TGGTCAATCT GCAGCCCATT TAGGCACTCT GCATTTAACC 

4 01 GATACCCGGA TTGAAGAAAG CTGGCGGGTG TATGGGTGAA GGAGAAGAAA 

4 51 GGAAGTGATT AGGAGAAACC TCATGGAGAT GAGCACATGC TACAACTAAT 

501 AACGTTATTC TACTTAAAAC GAGCAAAACA AAAAAAAAAA CAAGACAATT 

551 GAAAACGCAA TGGATGCATT CAGCTTAAAG AAGGATAATC GAAAAAAATT 

601 T CAAGAT AAA CAGAAATTGA AAAGAAAACA TGCCACACCC AGT GATAGAA 

651 AGTACCGGCT ATTGAACCGC CAAAAAGAAG AGAAAGCTAC CACAGAGGAG 

7 01 AAAGATCAAG ACCAAGAACA GCCCGCCCTG AAGTCAAACG AGGACAGGTA 

7 51 CTATGAGGAC CCGGTACTCG AGGACCCGCA TTCTGCAGTC GCCAATGCAG 

8 01 AGTTGAACAA GGTGCTAAAA GACGTCCTCA AAAATCGGCT CCAGCAGAAC 
8 51 GACGACGCCA CAGCCGTCAA TAATGTTGCT AATAAAGATA CTTTGAAAAT 
901 CAAAGACCTC AAGCAGATGA ATACGGATGA GCTCAATCGT TGGCTCGGAC 
951 GGCAGAATAC AACATCGGCT ATAACAGCGG CTGAGCCCGA AT CAT T AGT C 

1001 GTTCCCATTC ACGTACAAGG TGATCATGAT CGTGCGGGCA AGAAGATCAG 

1051 TGCCCCTTCG ACCGATCTAC CGGAAGAACT AGAGACCGAT CAGGATTTCC 

1101 TTGATGGACT GCTCTAA 
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FIGURE 55. YER034w Protein Sequence 

Nature 387:78-81 [97313264] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome V. 

Dietrich, F. S., Mulligan, J., Hennessy, K. , Yelton, M. A., Allen, E., 

Araujo, R. , Aviles, E. , Berno, A., Brennan, T . , Carpenter, J., Chen, 

E., Cherry, J. M. , Chung, E. , Duncan, M. , Guzman, E. , Hartzell, G. , et al. 

YER034W Length: 185 March 26, 1999 16:55 Type: P Check: 3501 
1 MDAFSLKKDN RKKFQDKQKL KRKHATPSDR KYRLLNRQKE EKATTEEKDQ 
51 DQEQPALKSN EDRYYEDPVL EDPHSAVANA ELNKVLKDVL KNRLQQNDDA 
101 TAVNNVANKD TLKIKDLKQM NTDELNRWLG RQNTTSAITA AEPESLWPI 
151 HVQGDHDRAG KKISAPSTDL PEELETDQDF LDGLL 
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FIGUBE 56. TKL077w DNA Sequence 

Sequence contains 1200bp of 5' promoter sequence. 

Symbols: 1 to: 2379 from: chrll.gcg ck: 9298, 289895 to: 292273 

Chromosome XI Sequence 

Nature 387:98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B., Albermann, K. , Aldea, M. , Alexandraki, D . , Ansorge, W., 
Arino, J., Benes, V., Bohn, C, Bolotin-Fukuhara, M. , Bordonne, R. , . . . 

gcgseq. tmp. 4920 Length: 2379 March 26, 1999 16:48 Type: N Check: 4118 

1 GAAAGGAAGC TATAGTAATG GGGCTTCAGG AACTTTATGA ATTGGGTGCT 

51 CTTGACACTC GTGGAAAGAT AACTAAACGG GGT CAACAAA TGGCTCTGTT 

101 ACCGCTACAA CCGCATTTAA GTAGTGTCTT AATTAAAGCC AGTGAAGTCG 

151 GATGTTTGAG TCAGGTCATT GATATCGTCT CTTGCCTTAG TGTGGAAAAT 

201 TTACTGTTGA ATCCGTCACC AGAAGAAAGA GATGAGGTGA ACGAGCGTCG 

251 TTTGTCCTTA TGCAACGCTG GTAAAAGGTA TGGTGACCTT ATCATGCTGA 

301 AAGAGCTTTT TGATATCTAT TTCTACGAAC TAGGGAAAAG TCAAGATGCA 

351 AGCTCTGAAA GAAATGATTG GTGTAAAGGA TTGTGTATTT CGATACGTGG 

401 GTTTAAAAAT GTAATTCGTG TCAGAGACCA GTTAAGAGTT TATTGTAAGC 

4 51 GTTTGTTTTC TTCAATCAGT GAAGAGGATG AAGAATCCAA AAAGATTGGT 

501 GAAGATGGCG AGCTAATTTC GAAAATTTTA AAGTGTTTCT TAACTGGGTT 

551 TAT C AAGAAT ACAGCTATAG GGATGCCAGA CAGGTCTTAT AGAACTGTTT 

601 CCACTGGAGA GCCGATAAGC ATTCATCCAT CATCTATGCT ATTTATGAAT 

651 AAAAGCTGCC CCGGTATAAT GTACACGGAG TATGTCTTTA CTACGAAGGG 

701 ATATGCCAGA AATGTTAGTA GGATTGAACT TTCATGGTTA CAAGAAGTTG 

751 TCACTAATGC AGCCGCTGTA GCAAAGCAAA AAGTTTCTGA TTCAAAATAA 

801 GTCACCTACT CTTAGCGCAT TTTTATTGTA TATAAAGGCA TTTAATGTAA 

851 TTTATAGAGC ATTATAAATC GTAACAACTA CTGCAGTATG AGTTTCATGG 

901 ATTCATTTCT CAATATCTTA TGAATATACA CAGGTATATA TGTATATTCA 

951 TGTTAAACGC CTTTCGAATT GTTCGTTGGC TTTTTTTGTG AAATTATCTC 

1001 GGGAAAAGGG C GAAAT TATA TTATTTTGCC GTTGACATTT TGAAAAGGAA 

1051 TAAAAGATCA TGAAAAAAAT AAGAAAGGCA ATTCGACGCA TTTCTCTCAG 

1101 CAAGCTATTC TTTACTTTTG AAGAACAAAA TATTTTAGCA AAAAGGTTAA 

1151 GACAATATAG TCGGAAGCAG TTCTGCGGGA TCTGAAGGAA TTGCGGAATA 

1201 ATGAGATTTC AC GAT AGT AT ACTTATCTTC TTTTCTTTGG CATCGCTTTA 

1251 TCAACATGTT CATGGTGCAA GACAAGTCGT TCGTCCAAAG GAGAAAATGA 

1301 CTACTTCAGA AGAAGTTAAA CCTTGGTTAC GTACGGTTTA TGGAAGTCAA 

1351 AAAGAATTAG TCACTCCTAC GGTCATTGCC GGTGTCACTT TTTCTGAAAA 

1401 AC C AGAAGAA ACACCAAATC CATTGAAACC TTGGGTATCT TTAGAGCATG 

1451 ATGGTAGGCC AAAAACCATT AAACCAGAAA TCAACAAAGG TCGAACCAAG 

1501 AAGGGAAGAC CTGATTACTC AACTTACTTC AAAACGGTAA GTTCCCACAC 

1551 ATATTCTTAT GAAGAATTGA AGGCTCACAA TATGGGCCCT AATGAAGTTT 

1601 TTGTAGAAGA AGAGTATATT GATGAAGATG ACACCTACGT CTCCCTGAAT 

1651 CCTATTGTCA GATGTACTCC TAATCTTTAC TTCAATAAAG GTCTAGCAAA 

1701 GGATATCCGC AGTGAGCCAT TTTGTACCCC TTATGAGAAT TCTAGATGGA 

1751 AGGTTGACAA GACTTACTTC GTTACTTGGT ATACAAGATT TTTTACAGAT 

1801 GAGAATTCCG GTAAAGTTGC TGATAAGGTT CGTGTTCATT TGTCCTATGT 

1851 TAAAGAAAAC CCCGTAGAGA AGGGCAATTA TAAAAGAGAT ATCCCTGCAA 

1901 CTTTTTTCTC TTCCGAATGG ATTGATAATG ACAACGGTCT AATGCCGGTT 

1951 GAGGTCAGAG ATGAATGGCT GCAGGACCAA TTTGATCGTA GGATCGTTGT 

2 001 ATCAGTTCAG CCAATATACA TATCAGATGA AGATTTCGAT CCACTACAAT 

2 051 ACGGTATTTT ATTATACATC ACTAAGGGTT CAAAAGTGTT TAAGCCTACT 

2101 AAGGAGCAAC TGGCTTTAGA CGATGCAGGT ATAACAAATG ATCAGTGGTA 

2151 TTATGTTGCA TTATCTATCC CTACTGTCGT GGTGGTATTT TTCGTCTTCA 

2201 TGTACTTTTT CTTATATGTC AACGGGAAAA ACAGAGATTT CACAGATGTT 

2251 ACTAGAAAAG CTTTAAACAA GAAACGCCGT GTTTTGGGTA AGTTCTCGGA 

2301 GAT GAAGAAA TTCAAAAACA TGAAAAATCA CAAGTACACC GAATTGCCAT 

2351 CTTATAAGAA AAC CAGTAAA CAAAATTAG 
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FIGURE 57 . YKL077W Protein Sequence 

Nature 387:98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B . , Albermann, K., Aldea, M. , Alexandraki, D., Ansorge, W., 
Arino, J., Benes, V., Bohn, C, Bolotin-Fukuhara, M. , Bordonne, R. , 
Boyer, J., Camasses, A., Casamayor, A., Casas, C, Cheret, G . , et al . 

YKL077W Length: 392 March 26, 1999 16:50 Type: P Check: 1732 . 



1 


MRFHDSILIF 


FSLASLYQHV 


HGARQWRPK 


EKMTTSEEVK 


PWLRTVYGSQ 


51 


KELVTPTVIA 


GVTFSEKPEE 


TPNPLKPWVS 


LEHDGRPKTI 


KPEINKGRTK 


101 


KGRPDYSTYF 


KTVSSHTYSY 


EELKAHNMGP 


NEVFVEEEYI 


DEDDTYVSLN 


151 


PIVRCTPNLY 


FNKGLAKDIR 


SEPFCTPYEN 


SRWKVDKTYF 


VTWYTRFFTD 


201 


ENSGKVADKV 


RVHLSYVKEN 


PVEKGNYKRD 


IPATFFSSEW 


IDNDNGLMPV 


251 


EVRDEWLQDQ 


FDRRIWSVQ 


PIYISDEDFD 


PLQYGILLYI 


TKGSKVFKPT 


301 


KEQLALDDAG 


ITNDQWYYVA 


LSIPTWWF 


FVFMYFFLYV 


NGKNRDFTDV 


351 


TRKALNKKRR 


VLGKFSEMKK 


FKNMKNHKYT 


ELPSYKKTSK 


QN 
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FIGURE 58 . XGR046v DNA Sequence 

Sequence contains 599bp of 5' promoter sequence. 

Symbols: 1 to: 1757 from: chr7 . gcg ck: 9962, 584290 to: 5860 

Chromosome VII Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome VII. 

Tettelin, H., Agostoni Carbone, M. L. , Albermann, K . , Albers, M. , 

Arroyo, J., Backes, U., Barreiros, T . , Bertani, I., Bjourson, A. J. , . . . 

gcgseq.tmp.22 8 Length: 1757 March 26, 1999 16:44 Type: N Check: 9449 

1 TCTCACTCCG GCGGCCATTT TACGTGACGA AGCATCCCTT ACAACAGAAA 

51 GAAGAAAAAA GATATGCCGC TTTGCGGTTT CTTTCTGGCA ATGTATGCAC 

101 TCATAATGCT ACTCGTTTAC CCACTATCCC TGTCCAAACT AAAGAGGGAG 

151 GAAAGCACTT TTTGCATTTA CACATCGTAG ATTATAAAAT GATCGTTAAC 

201 AGGCGCTTGT GATTTTGAAT TTAAGAAATG TGGACTAGAG AAGTCTTAAA 

251 TCGCCAATGC TGTACCAGAC TCTCTATAGC AT C T AAAC AC GAAATTCAAC 

301 TGTTATCTTA GTTTTTCACT TACCAGTAGC GCGCTTGTTA TTCCCACGTT 

351 ATTATTTGCC CCCACATCAT AGGTCAAGTG ACCTTCTCTT ACCCGACATG 

4 01 AATAAAGAAA AGAAAAGAAA TCATACCCTT CAGCCTGTTT AGC CAT AAA.T 

451 AGTAAAGAGT AGATGTTTCG ACGGACTAAA TAATGTGAAA AAGGTTCTAA 

501 AACCTTCAAA ACAATTAAAC TTGAGAAACG TTGCTATAGG ATTGAGCTAA 

551 TAATTTGAAT TAATAGGAGC TGCTTTTTAC TTTGATATAT CCTGAAGTTA 

601 TGTTACGAGT TTCTGAAAAT GGTCTACGGT TTCTGCTGAA ATGCCATTCA 

651 ACGAACGTAA GCATGTTTAA TAGGCTTCTG AGTACTCAAA TAAAGGAGGG 

701 GAGAAGTTCC AT AG AT GAT G CTGGCATTAT CCCCGATGGA ACTATTAACG 

751 AAAGGCCGAA TCACTACATC GAGGGAATTA CTAAAGGCAG TGATCTGGAC 

801 CTCTTGGAAA AAGGTATAAG AAAAACTGAC GAAATGACTT CCAATTTTAC 

851 AAATT AT AT G TACAAGTTTC ACAGATTGCC CCCCAACTAT GGAAGTAACC 

901 AGCTCATTAC TATCGATAAG GAACTT CAAA AGGAACTGGA TGGGGTAATG 

951 TCATCCTTCA AAGCTCCGTG CCGGTTTGTA TTTGGTTACG GCTCAGGAGT 

1001 TTTCGAACAA GCGGGATATT CCAAAAGTCA TAGCAAACCT CAAATCGATA 

1051 TAATCTTGGG CGTCACATAT CCATCACATT TTCACTCTAT TAATATGAGG 

1101 CAGAATCCGC AACATTATTC AAGTTTGAAA TACTTCGGTT CCGAGTTCGT 

1151 GTCTAAATTT CAACAGATCG GCGCAGGCGT ATATTTTAAT CCATTTGCAA 

1201 AT AT AAAT G G ACACGACGTA AAATATGGGG TGGTTTCTAT GGAAACACTT 

12 51 TTAAAGGACA TAGCTACTTG GAATACATTC TATTTAGCAG GACGACTACA 

1301 AAAGCCTGTA AAAATATTGA AGAATGATTT GAGAGTGCAA TATTGGAACC 

1351 AATTAAACTT AAAAGCTGCA GCTACTTTGG CCAAACATTA CACCTTAGAG 

14 01 AAAAATAACA ATAAGTTTGA CGAATTCCAA TTTTACAAGG AGATCACTGC 
1451 CTTAAGTTAT GCAGGT GATA TTAGATACAA ACTGGGTGGA GAAAATCCCG 

15 01 ACAAAGTTAA CAACATTGTT ACCAAAAACT TTGAAAGATT TCAAGAGTAT 
1551 TACAAGCCGA TTTACAAAGA AGTGGTCCTA AATGATTCAT TTTATCTTCC 
1601 AAAAGGGTTC ACATTAAAGA ATACTCAGAG ACTTTTGCTC AGCCGTATTA 
1651 GTAAATCAAG TGCATTACAA ACTATTAAAG GTGTTTTCAC AGCTGGAATC 
17 01 ACAAAGTCAA TTAAGTATGC TTGGGCCAAA AAACTAAAAT CGATGAGGAG 
17 51 AAGCTAG 
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FIGURE 59. YGR046w Protein Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome VII. 

Tettelin, H., Agostoni Carbone, M. L., Albermann, K., Albers, M. , 
Arroyo, J., Backes, U., Barreiros, T., Bertani, I., Bjourson, A. J., 
Bruckner, M. , Bruschi, C. V., Carignani, G., Castagnoli, L . , Cerdan, et 

YGR046W Length: 385 March 26, 1999 16:46 Type: P Check: 4137 .. 



1 


MLRVSENGLR 


FLLKCHSTNV 


SMFNRLLSTQ 


IKEGRSSIDD 


AGIIPDGTIN 


51 


ERPNHYIEGI 


TKGSDLDLLE 


KGIRKTDEMT 


SNFTNYMYKF 


HRLPPNYGSN 


101 


QLITIDKELQ 


KELDGVMSSF 


KAPCRFVFGY 


GSGVFEQAGY 


SKSHSKPQID 


151 


IILGVTYPSH 


FHSINMRQNP 


QHYSSLKYFG 


SEFVSKFQQI 


GAGVYFNPFA 


201 


NINGHDVKYG 


W3METLLKD 


IATWNTFYLA 


GRLQKPVKIL 


KNDLRVQYWN 


251 


QLNLKAAATL 


AKHYTLEKNN 


NKFDEFQFYK 


EITALSYAGD 


IRYKLGGENP 


301 


DKVNNIVTKN 


FERFQEYYKP 


IYKEWLNDS 


FYLPKGFTLK 


NTQRLLLSRI 


351 


SKSSALQTIK 


GVFTAGITKS 


IKYAWAKKLK 


SMRRS 
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FIGURE 60 . XJR041c DNA Sequence 

This sequence includes lOOObp of 5' promoter sequence. 

Symbols: 1 to: 4525 from: chrlO.gcg /rev ck : 4711, 509927 to: 514451 

Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide sequence of 
Saccharomyces cerevisiae chromosome X. 

Galibert, F . , Alexandraki, D., Baur, A., Boles, E . , Chalwatzis, N . , 
Chuat, J. C, Coster, F. , Cziepluch, C, De Haan, M. , Domdey, H., . . 

gcgseq.tmp. 25123 Length: 4525 March 26, 1999 11:33 Type: N Check: 4481 

1 TACCTGCTGT AGAATCCTTC ACT GAAAAC A CTTGTTCAAT ATATTCTTCA 

51 TCTGGTTCAC CGTCTGATCT ATTAATCCAG TTTAGCAATG ACTCAATAAA 

101 CTCTGATCTG TTCTCCTCTA CATCCTGACC ATCTAATATG AAGTACATTG 

151 TCCTCAGACA GTTTAAAACG GTTAAAGATT CTTCCAACTC ATAAAATCGG 

2 01 TTCACTCTTC CATCCTGATC CTTGACTCTA CCAATAAACA CTTCCAATTC 

251 ATTCAGAATC GCCTCCATGG CCAGATTTAC TGTTGCATTA TGCTCCTTCG 

301 CGAAATTAGA ATTAACAACT CCAATCGTTG GTACATTAAA CACTCTGTCA 

351 TCACCTAAAT CACGGTAAAT TTCAAATAAA CCTGATACGT AT GCAGAAAA 

4 01 CTCTTTGCTG GTATCTAATC TAGGAATTCT AACAGGATAA AGCTTATATT 

451 TATCTTTTGC AGTTATGAAT GCCATATTTT GGTAAGAAAG TGGCCCCAGC 

501 TTGAACTTTA AAGGCATCTT GTCGCCATTT TTTTCAATCG GTTGATCATT 

551 TACAGTCATA GGGACCAGGA TAGCCCCGCT GACTGGGTCC CTTTTATATA 

601 GTTGTTCTTC TTCATCGGTC TTGTTATTAC TAAGTTGCGC CGTTCCGTCG 

651 TCCAAAAAAT CAAATTGATC GAC GT C CAT A AGTAATCGAT TTGAATCATC 

701 GATTGTCATA TCTGATAATT GCGTTCTGGC TCACGCTTAT TGACTCAACT 

751 CAAGACCGTA AGTTCAATGT TTT CTAT AC A ACTACAATTT GTACAAGGCT 

8 01 TGACTTCCAT CCAACTAAAA AACCTCTCCG TCGTGCGCGA TCTGAAAAAT 

8 51 TTCACTTAGC TCATCTCAAA ATGATCGCTA AGAGGGCACT TGGTCACAAC 

901 TACAGAATTG TTTACTAGCA TAGGAACATC TCTGTCTAAG ATTTAGCTTG 

951 CCATCAATTA TCTTTGGAAA AACAGAGAGT ATACTGCACT TTTTGATAAT 

1001 ATGGGTGATC TTACAGAAGA AC TAT C TAT C CCAGACAATG CCCAAGATTT 

1051 GTCGAAATTA CTACGTTCGA CGAGCACAAA ACCCCATCAA ATTGCCGAGA 

1101 TAGTTTCAAA AT TT GAT AAA TTAGAAACCT ACTTTCCAAA AAAAGAAATT 

1151 TTCGTCTTAG ATTTACTCAT TGATAGGCTC AACAATGGAA ATTTGGATGA 

12 01 TTTTAAGACC AGTGAACATA CTTGGATAAT TTTCACGAGA TTATTAGATG 

1251 CTATTAATGA TCCAATTTCG ATAAAAAAAC TACTCAAAAA ATTGAAGACT 

1301 GTGCCTGTAA T G AT AAGAAC ATTTTTCCTT TGGCCTAAAG ACAAATTACT 

1351 TACACGTAGC GTTTCGTTTA TAAAAGCATT TTTTGCGATT AATGACTACT 

1401 TGATTGTCAA TTTTTCTGTT GAAGAGTCTT TTCAACTTTT AGAACATGCC 

1451 ATAAATGGAT TAAGTTCGTG CCCGACGACT GACTTTGCGC TTTCATACTT 

1501 GCAAGATGCC TGCAATCTAA CTCATGTTGA CAATATTACT AC AAC GGATA 

1551 ACAAAATTGC TACTTGTTAC TGCAAGCATA TGCTACTACC AAGTTTAAGA 

1601 TATTTCGCAC AGACCAAAAA TTCTGCATCT TCAAACCAAT CCTTCATTCG 

1651 T CTAT CT CAT TTTATGGGAA AGTTCCTTTT ACAACCACGC ATAGATTACA 

1701 TGAAATTAAA TAAAAAGTTT GTCCAAGAGA ATGCGTCCGA AATTACCGAC 

1751 GATATGGCTT ATTATTATTT TGCCACTTTC GTCACTTTCT TATCAAAAGA 

1801 CAATTTTGCT CAACTAGAAG TCATCTTTAC AATTTTAGGT GCCAAGAAAC 

1851 CTAGTTTAGA ATGCAGATTT CTGAATCTTT TATCGGAATC GAAGAAAACC 

1901 GTATCTCAAG AGTTCCTTGA AGCATTATTG CTTGAAATGT TAGCGTCGAC 

1951 TGATGAATCT GGAGTGTTAT CATTAATACC AATTATCCTT AAATTGGATA 

2 001 TCGAGGTTGC TATTAAACAT ATTTTTCGGT TACTTGAATT GATTCAGCTC 

2 051 GAAAATTTGA ACGATCCTCT CTTTTCCTCT CATATTTGGG ATTTAATAAT 

2101 CCAATCACAC GCTAACGCAA GGGAATTATC AGATTTTTTT GCCAAAATAA 

2151 ATGAGTACTG TTCCAGAAAA GGACCCGATT CCTATTTTTT GATAAATCAT 

2201 CCTGCATATG TCAAGTCTAT AACGAAGCAA TTGTTCACTT TATCTTCTTT 

2251 ACAAT GGAAA AATCTATTGC AAGCTTTACT TGACCAAGTC AATCACGATT 

2301 CCACCAACAG GGTTCCTTTA TATTTAATAC GCATATGCTT GGAGGGACTA 
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2351 TCAGAGGGCG CATCGCGCGC AACTCTCGAT GAGGTAAAGC CTATTTTATC 

2401 TCAAGTATTT ACTTTGGAAT CATTTAATAA CAGTCTTCAA TGGGACCTAA 

2451 AGT AT CAT AT AATGGAAGTC TACGATGATA TTGTCCCTGC AGAGGAACTA 

2 501 GAAAAAATCG ATTACGTGTT ATCTTCTAAT ATTTTTGATA CTACATCGGC 

2551 TGATGTTGAA GAACTGTTCT TTTATTGCTT CAAATTGAGA GAATATATTT 

2 601 CGTTCGATCT TTCTGATGCA AAAAAAAAAT TCATGAGGCA CTTTGAAATC 

2 651 CTTGACGAAG AAAGAAAGTC AAACTTAT CA TACT CTGTTG TGTCCAAATT 

2701 TGCAACATTA GTAAACAACA ACTTTACAAG AGAACAAATT TCTTCTTTAA 

2751 TTGATTCATT ACTATTGAAC TCGACAAATT TATCTTCGTT ATTAAAAAAT 

2 801 GATGACATTT TTGAGGAGAC AAATATCACG TACGCTTTAA TAAACAAGCT 

2 851 TGCTTCATCA TACCATCAAA CCTTCGCTCT AGAAGCTTTG ATTCAAATTC 

2 901 CTATCCAATG CATCAACAAA AACGTTAGAG TGGCTCTCAT TAACAATCTA 

2 951 ACATGCGAAT CATTTTGCCT TGATTCCGCT ACTAGAGAAT GCCTCCTTCA 

3001 TTTATTGTCA AGCCCGACCT TCAAGAGCAA CATTGAAACA AATTTCTACG 

3051 AATTATGTGA GAAAACAATA ATGAGCCCCG AAATGGCCAT TTCAGAGACA 

3101 GGTGATGAAA AAAAGGAAAT AGAAGACAAA ATATCTATTT TCGAAAAAGT 

3151 TTGGACTAAT CATCTGTCAC AGGCAAAGGA GCCTGTGAGT GAGAAGTTCT 

32 01 TAGAATCTGG TTACGATATC GTTAAACAGT CAATGTCATT GTCCAATGGT 

3251 GATAGCAAAC TAATTATCGC CGGGTTTACT ATCGCAAAAT T.TTTGAAACC 

3301 AGATAACAAG CATAGAGATA TACAAGGTAT GGCAATTAGC TATGCTGTTA 

3351 AAATTTTGGA AAACTACTCT GAAAATTTTG AATCT GAAAC AATTCCCCTT 

34 01 TTCAGAATAT CAATGTCTAC ATTGTACAAG ATTATAACGA CCGGACAAGG 

34 51 CGATATTTCT AAGCATAAAT CGAGAATTCT GGATATATTT TCCAAAATTA 

3501 TGCTTCGATA TCATTCTAAA AAAGTGTACC ATGCGCCAGA AGAACAGGAA 

3551 ATGTTTTTGG TTCATTCACT CCTTACAGAA AACAAGTTGG AGTATATTTT 

3601 TGCAGAGTAC TTAAATATTG AGCATACAGA TAAGTGCGAT TCTGCCTTGG 

3651 GGTTCTGCTT GGAAGAAAGT CTTAAACAAG GTCCTGATGC GTTTAACCGC 

37 01 CTGCTCTGGA ACAGTGCTAA ATCGTTTTCC ACCATTAGCC AACCTTGTGC 

37 51 TGAAAAATTT GTGAGAGTTT TTATCATAAT GTCAAAAAGG ATTGCAAGAG 
3801 ACAATAACCT TGGTCATCAC CTATTTGTGA TAGCTTTACT TGAAGCCTAC 

38 51 ACCTATTGTG ATATAGAAAA ATTTGGCTAC AAGTCATACT TGCTACTGTT 
3901 CAATGCTATC AAGGAGTTCT TAGTATCGAA ACCATGGCTA TTCAGCCAAT 
3951 ACTGTATTGA AATGCTGCTT CCTTTCTGTT TAAAAACTCT CGCTTTTATA 
4001 GTAAACCATG AGTCAACGGA TGAAATCAAT GAAGGCTTTA TTAACATCAT 
4051 C GAAGT GAT A GAT CAT AT GC TATTAGTTCA CAGGTTTAAA TTTTCCAATC 
4101 GTCACCATTT GTTTAACTCC GTTCTTTGCC AGATACTAGA AATAATAGCA 
4151 ATT CAT GAT G GTACATTGTG TGCAAATTCA GCAGACGCCG TAGCCAGACT 
4201 AATAACGAAC TACTGCGAGC CTTATAATGT ATCAAACGCT CAAAATGGGC 
4251 AGAAAAATAA CTTAAGCTCA AAGATAAGTT TGATAAAGCA GTCCATCAGA 
4 301 AAAAAT GT AC TTGTGGTTCT AACGAAATAT ATACAGTTGT CTATTACGAC 
4351 GCAGTTCAGT TTAAACATAA AAAAGAGTCT GCAGCCCGGT ATTCATGCGA 
4 401 TTTTTGATAT ATTATCTCAG AACGAGTTGA ATCAATTGAA CGCTTTCCTT 
4451 GACACACCTG GGAAACAATA TTTCAAAGCA CTTTACCTCC AATACAAAAA 
4 501 GGTTGGTAAA TGGCGCGAAG ATTAA 



FIGURE 60 (oont) . 
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FIGURE 61 . YJR041c Protein Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide sequence of 
Saccharomyces cerevisiae chromosome X. 

Galibert, F., Alexandraki, D., Baur, A., Boles, E., Chalwatzis, N., 

Chuat, J. C, Coster, F., Cziepluch, C, De Haan, M. , Domdey, H., 

Durand, P., Entian, K. D., Gatius, M. , Goffeau, A., Grivell, L. A., et al. 

YJR041C Length: 1174 March 26, 1999 11:35 Type: P Check: 5083 .. 

1 MGDLTEELSI PDNAQDLSKL LRSTSTKPHQ IAEIVSKFDK LETYFPKKEI 

51 FVLDLLIDRL NNGNLDDFKT SEHTWIIFTR LLDAINDPIS IKKLLKKLKT 

101 VPVMIRTFFL WPKDKLLTRS VSFIKAFFAI NDYLIVNFSV EESFQLLEHA 

151 INGLSSCPTT DFALSYLQDA CNLTHVDNIT TTDNKIATCY CKHMLLPSLR 

201 YFAQTKNSAS SNQSFTRLSH FMGKFLLQPR IDYMKLNKKF VQENASEITD 

251 DMAYYYFATF VTFLSKDNFA QLEVIFTILG AKKPSLECRF LNLLSESKKT 

301 VSQEFLEALL LEMLASTDES GVLSLIPIIL KLDIEVAIKH IFRLLELIQL 

351 ENLNDPLFSS HIWDLIIQSH ANARELSDFF AKINEYCSRK GPDSYFLINH 

401 PAYVKSITKQ LFTLSSLQWK NLLQALLDQV NHDSTNRVPL YLIRICLEGL 

451 SEGASRATLD EVKPILSQVF TLESFNNSLQ WDLKYHIMEV YDDIVPAEEL 

501 EKI DYVLSSN IFDTTSADVE ELFFYCFKLR EYISFDLSDA KKKFMRHFEI 

551 LDEERKSNLS YSWSKFATL VNNNFT REQI SSLIDSLLLN STNLSSLLKN 

601 DDIFEETNIT YALINKLASS YHQTFALEAL IQIPIQCINK NVRVALINNL 

651 TCESFCLDSA TRECLLHLLS SPTFKSNIET NFYELCEKTI MSPEMAISET 

701 GDEKKEIEDK ISIFEKVWTN HLSQAKEPVS EKFLESGYDI VKQSMSLSNG 

751 DSKLIIAGFT IAKFLKPDNK HRDIQGMAIS YAVKILENYS ENFESETIPL 

801 FRISMSTLYK IITTGQGDIS KHKSRILDIF SKIMLRYHSK KVYHAPEEQE 

851 MFLVHSLLTE NKLEYIFAEY LNIEHTDKCD SALGFCLEES LKQGPDAFNR 

901 LLWNSAKSFS TISQPCAEKF VRVFIIMSKR IARDNNLGHH LFVIALLEAY 

951 TYCDIEKFGY KSYLLLFNAI KEFLVSKPWL FSQYCIEMLL PFCLKTLAFI 

1001 VNHESTDEIN EGFINIIEVI DHMLLVHRFK FSNRHHLFNS VLCQILEIIA 

1051 IHDGTLCANS ADAVARLITN YCEPYNVSNA QNGQKNNLSS KISLIKQSIR 

1101 KNVLWLTKY IQLSITTQFS LNIKKSLQPG IHAIFDILSQ NELNQLNAFL 

1151 DTPGKQYFKA LYLQYKKVGK WRED 
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FIGURE 62. HES1 DNA Sequence 

DNA sequence includes 1089bp 5' promoter sequence. 

Symbols: 1 to: 2394 from: chrl5.gcg ck : 9129, 780903 to: 783296 

Chromosome XV Sequence 

Nature 387:98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B., Albermann, K . , Aldea, M. , Alexandraki, D., Ansorge, W., 
Arino, J., Benes, V., Bohn, C, Bolotin-Fukuhara, M. , Bordonne, R. , . . . 

gcgseq.tmp. 10515 Length: 2394 March 26, 1999 14:35 Type: N Check: 4842 



1 CATGGCTGGA GGAAAGATTC CTATTGTAGG AATTGTGGCA TGTTTACAGC 

51 CGGAGATGGG GATAGGATTT CGTGGAGGTC TACCATGGAG GTTGCCCAGT 

101 GAAATGAAGT ATTTCAGACA GGTCACTTCA TTGACGAAAG AT C C AAACAA 

151 AAAAAATGCT TTGATAATGG GAAGGAAGAC ATGGGAATCC ATACCGCCCA 

201 AGTTTCGCCC ACTGCCCAAT AGAATGAATG TCATTATATC GAGAAGCTTC 

251 AAGGACGATT TTGTCCACGA TAAAGAGAGA TCAATAGTCC AAAGTAATTC 

301 ATTGGCAAAC GCAATAATGA ACCTAGAAAG CAATTTTAAG GAGCATCTGG 

351 AAAGAATCTA CGTGATTGGG GGTGGCGAAG TTTATAGTCA AATCTTCTCC 

401 AT T AC AG AT C ATTGGCTCAT CACGAAAATA AATCCATTAG ATAAAAACGC 

451 AACTCCTGCA ATGGACACTT TCCTTGATGC GAAGAAATTG GAAGAAGTAT 

501 TTAGCGAGCA AGATCCGGCC CAGCTGAAAG AATTTCTTCC CCCTAAAGTA 

551 GAGTTGCCCG AAACAGACTG TGATCAACGC TACTCGCTGG AAGAAAAAGG 

601 TTATTGCTTC GAATTCACTC TATACAATCG TAAATGAAAC CTCTCCGCCC 

651 GTATATTTTT TTTAATATGT TAAATAGTGA TAGAACTGAT AAGCCTCATT 

701 TTCTTTTATT GGGCTCCAAG ACGCGAACTG TTCGTAGGGT AACCGTTTGA 

751 CACCTAAACG ACCTTTCAGC CTCACCTGCA GTATTTCTTC AACAACGCCT 

801 GTCGCTATGT TAAATAATAG CAATCGTTTG TGATCACCAT TGTCGAATTT 

851 GACGCGCTTA AACAAAAACC ATTGTTTTGG CCTCGTTCCC TGCATTCAAC 

901 AAAAGAGCAA GGTATGCCGT CAAACAGTCG TTAAAAGAGA AGGTTTATAA 

951 ACTATCTTGT TTTGTACTTT GCTGTCCCGG ATCCAGTTGG GTCTTCTTTT 

1001 CAACCTGTCT GAGTCCGATC TTTCTTTCCC TACTTGAAGC TCCATATATC 

1051 TAAGTCATCT AAGTGTATCC TGCTAGATTA CAAACGAAAA TGTCTCAACA 

1101 CGCAAGCTCA TCTTCTTGGA CTTCTTTTTT GAAAT C GAT A AGTTCGTTCA 

1151 ACGGAGATCT ATCGTCTTTG TCTGCACCAC CGTTTATTCT TTCTCCCACT 

1201 TCCTTAACAG AGTTTTCTCA GTATTGGGCT GAACATCCAG CTTTATTTCT 

1251 GGAGCCTTCG TTGATTGATG GTGAAAACTA CAAAGATCAC TGTCCCTTTG 

1301 ACCCAAATGT GGAATCAAAG GAAGTGGCGC AGATGTTGGC GGTTGTTAGG 

1351 TGGTTTATTT C TACT TT GAG ATCTCAATAC TGCTCTAGAA GCGAATCGAT 

1401 GGGTTCTGAA AAGAAGCCTT TGAACCCATT CTTGGGTGAG GTATTTGTTG 

1451 GAAAGTGGAA AAATGATGAG CATCCAGAGT TTGGTGAAAC GGTTCTTTTA 

1501 AGTGAGCAAG TTTCACATCA TCCACCTATG ACAGCATTTT CGATTTTTAA 

1551 TGAAAAAAAT GATGTTTCTG TTCAAGGATA CAATCAAATT AAAACTGGTT 

1601 TTACCAAAAC ATTGACGCTA ACGGTCAAAC CATACGGGCA TGTCATTTTG 

1651 AAGATTAAAG ATGAGACCTA CCTGATTACA ACCCCGCCTT TGCATATCGA 

1701 AGGTATTTTA GTCGCTTCTC CATTTGTTGA ATTAGGAGGC AGGTCATTCA 

1751 TACAGTCATC AAATGGTATG TTATGTGTTA TAGAATTTTC AGGAAGGGGG 

1801 TATTTCACAG GGAAGAAGAA CTCCTTTAAG GCAAGAATTT ACAGAAGCCC 

1851 ACAAGAGCAT AGT CAT AAAG AAAATGCGCT ATACCTAATC TCTGGCCAAT 

1901 GGTCAGGTGT TTCAACAATT ATAAAAAAAG ACTCGCAAGT TTCACATCAG 

1951 TTTTACGATT CATCGGAAAC TCCTACTGAA CATTTATTAG TTAAGCCAAT 

2 001 CGAAGAACAA CATCCTCTGG AAAGTAGGAG GGCATGGAAG GATGTGGCAG 

2 051 AAGCAATCAG ACAAGGAAAT ATTAGTATGA TAAAAAAGAC TAAGGAAGAA 

2101 CTAGAAAATA AGCAAAGAGC CTTGAGAGAA CAAGAACGCG TAAAAGGTGT 

2151 GGAATGGCAA AGAAGATGGT TCAAACAAGT GGACTACATG AATGAAAATA 

2201 CATCAAATGA TGTAGAGAAA GCAAGTGAAG ATGATGCCTT TAGGAAATTG 

2251 GCGTCCAAAC TGCAGCTTTC TGTGAAAAAT GTGCCAAGTG GGACATTGAT 

2301 TGGCGGCAAA GATGATAAGA AAGATGTTTC AACCGCATTG CATTGGAGGT 
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FIGURE 63. HE SI Protein Sequence 

Nature 387:98-102 [97313270] (1997) The nucleotide sequence of 
Saccharomyces cerevisiae chromosome XV. 

Dujon, B., Albermann, K., Aldea, M. , Alexandraki, D . , Ansorge, W., 
Arino, J., Benes, v., Bohn, C. r Bolotin-Fukuhara, M. , Bordonne, R. , 
Boyer, J., Camasses, A., Casamayor, A., Casas, C, Cheret, G. , et al . 

YOR237W Length: 434 March 26, 1999 14:37 Type: P Check: 7501 . 



1 


MSQHASSSSW 


TSFLKSISSF 


NGDLSSLSAP 


PFILSPTSLT 


EFSQYWAEHP 


51 


ALFLEPSLID 


GENYKDHCPF 


DPNVESKEVA 


QMLAWRWFI 


STLRSQYCSR 


101 


SESMGSEKKP 


LNPFLGEVFV 


GKWKNDEHPE 


FGETVLLSEQ 


VSHHPPMTAF 


151 


SIFNEKNDVS 


VQGYNQIKTG 


FTKTLTLTVK 


PYGHVILKIK 


DETYLITTPP 


201 


LHIEGILVAS 


PFVELGGRSF 


IQSSNGMLCV 


IEFSGRGYFT 


GKKNSFKARI 


251 


YRSPQEHSHK 


ENALYLISGQ 


WSGVSTIIKK 


DSQVSHQFYD 


SSETPTEHLL 


301 


VKPIEEQHPL 


ESRRAWKDVA 


EAIRQGNISM 


IKKTKEELEN 


KQRALREQER 


351 


VKGVEWQRRW 


FKQVDYMNEN 


TSNDVEKASE 


DDAFRKLASK 


LQLSVKNVPS 


401 


GTLIGGKDDK 


KDVSTALHWR 


FDKNLWMREN 


EITI 





74/88 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 PCT/US00/08604 



1e+7 



1e+6 



1e+5 



1e+4 




1e+4 1e+5 1e+6 1e+7 



Figure 64 



75/88 

SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 



PCT/US00/08604 



FIGURE 65 . Rat Gene with Similarity to YLRlOOw 



04-FEB-1999 



LOCUS 1397235 334 aa 

DEFINITION ovarian-specific protein. 
ACCESSION 1397235 
PID gl397235 

DBSOURCE locus RNU44803 accession U448031 
KEYWORDS 

SOURCE Norway rat. 

ORGANISM Rattus norvegicus 

Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; 
Vertebrata; Eutheria; Rodentia; Sciurognathi; Myomorpha; Muridae; 
Murinae; Rattus. 
REFERENCE 1 (residues 1 to 334) 

AUTHORS Duan,W.R., Linzer, D. I . H . andGibori,G. 

TITLE Cloning and characterization of an ovarian-specific protein that 

associates with the short form of the prolactin receptor 
JOURNAL J. Biol. Chem. 271 (26), 15602-15607 (1996) 
MEDLINE 9627 9080 
REFERENCE 2 (residues 1 to 334) 
AUTHORS Gibori,G. and Duan,W.R. 
TITLE Direct Submission 

JOURNAL Submitted ( 05- JAN-1 996 ) Geula Gibori, Department of Physiology, 
University of Illinois at Chicago, Chicago, IL 60612, USA 
COMMENT Method: conceptual translation. 

FEATURES Location/Qualifiers 
source 1 . . 334 

/organism="Rattus norvegicus" 
/ s t rain=" Sprague-Dawley " 
/db_xref="taxon: 10116" 
/sex=" female" 

/tissue_type="corpus luteum" 

/dev_stage="pregnant " 

/cell_type="luteal " 
Protein 1 . . 334 

/product= "ovarian- specif ic protein" 
CDS 1..334 

/note="The protein can associate with the short form of 

prolactin receptor in the rat corpus luteum. " 

/coded_by="U44803: 15 .. 1019" 

ORIGIN 

1 mrkwlitga ssgiglalcg rllaedddlh lclacrnlsk agavrdalla shpsaevsiv 
61 qmdvsnlqsv vrgaeevkrr fqrldylyln agimpnpqln lkaffcgifs rnvihmfsta 
121 eglltqndki tadgfqevfe tnlfghfili relepllchs dnpsqliwts srnakksnfs 
181 lediqhakgq epyssskyat dllnvalnrn fnqkglyssv tcpgwmtnl tygilppfvw 
241 tlllpviwll rffahaftvt pyngaealvw lfhqkpesln pltkylsgtt glgtnyvkgq 
301 kmdvdedtae kfyktllele kqvritiqks dhhs 

// 
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FIGURE 66. DAK1 DNA Sequence 

This sequence contains 1200bp of 5' promoter sequence. 
Symbols: 1 to: 2955 from: chrl3.gcg ck- 8335 

132275 to: 135229 

Chromosome XIII Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome XIII. 
Bowman, S., Churcher, C, Badcock, K., Brown, D., 
Chillingworth, T . , Connor, R. , Dedman, K., Devlin, K., 
Gentles, S., Hamlin, N. , Hunt, S., . 

gcgseq.tmp. 16080 Length: 2955 March 31, 1999 09:57 Tvioe- 
N Check: 5254 . . 

1 TAATATAAAT ACTAGTCGTT AGATGATAGT TGCTTCTTAT TCCGAAAATG 

51 AGTATGGAAG TGTTGCATAT GATAGGGCGG CTACAGTGAT GGTAAACATA 

101 AGATACTTTA GCGGGAAATT AGCAACTGGA AGTTAAATTA TCTAGACATA 

151 AGTGTGGCGG TCACGCTGAA CGCAGGAGAT CGGATAGATT GATAAGCTGA 

2 01 TCAAGAACAT TGATCGGTTT GTTGTTTAAA GAATGGTTTT TGAAAACGTT 

2 51 TGACCAGTTG CTTCTCCCAG ACGCTTACCG ATATGATGAT AAAGATAATA 
301 TCTTCAATTG AATACCCCGT GGATCAGCAC GAATAACAGA AAAAAAGGGT 

3 51 GAAATTCACC GTAAGCATGA TACGCACTAC GTTCTTCTTA CCTTTGCCAA 

4 01 CGTGTTGTCT TTGACGTACG TAATTATGGG AGATCGTTGA TGATTAGCCC 
4 51 CAGCTCACTT TCTTCTTAAT GACTGACCCG CTACTATCAA AATTAAGGTG 
501 TCAAATATCA TGATGAATGA GGTCTCTAGG CGACTCAATT ATACATCTTT 
551 TAGAGATTTT TTTACTACTT GCAGATAATT TCTCAAGGGA TTAGATTCAA 
601 ATCTGGCTTG TCAATTACGC CCTTTTCAAG CTCATCAAAT TGCGTATGTC 
651 ATTCATGCTT CCATTAGGAA CCATAGAAGC ATGGCTGAAA TGGCAATATA 

7 01 CGGCTTCCCA ATTTCAACTC TAAAGTAATG GCGGTCGAAT TTAATCTATA 
751 TTTTACAGTT TTATACGTAC TTTAAAAGCA ATCAGTAAAC ACCTCTGGTG 
801 CTATTCAAGG GTTTTTTGCC TTTATTTGTT ACTGTCAATT GTCTGGCGCT 

8 51 GTGATAAAAA ACAAGGCATA AAGCTCCCCC GTCATGAACA TTAAGACTCG 
901 CTAGACGAGA GAGTGAAATA TAATGCATTT CCTGATTTAA ATGCGCTACA 
951 AACATGGTGT AAATCTGGCC CGGAGTGAGT GCTTGCCAAT TTGGCTTCTA 

1001 AG GGAGAAAG ATCAAACCAC TCCCAATTGC GTCATTTTGA AAGAGTGGCC 

1051 ACCTCGCGAG CGTCTGTCGA ACTAACTGAT GAATAAATAT ATAAGGAGAA 

1101 AATCACTTCA ACTTCGCTAC AAGTAGTCAC TATTTGTAGC AACTGTAAAC 

1151 GAACACATCA AAGAATAAGA TTACATTCTA TATCTAAGAC TAAATTTTAA 

12 01 ATGTCCGCTA AATCGTTTGA AGTCACAGAT CCAGTCAATT CAAGTCTCAA 

12 51 AGGGTTTGCC CTTGCTAACC CCTCCATTAC GCTGGTCCCT GAAGAAAAAA 
1301 TTCTCTTCAG AAAGACCGAT TCCGACAAGA TCGCATTAAT TTCTGGTGGT 

13 51 GGTAGTGGAC ATGAACCTAC ACACGCCGGT TTCATTGGTA AGGGTATGTT 

14 01 GAGTGGCGCC GTGGTTGGCG AAATTTTTGC ATCCCCTTCA ACAAAACAGA 
14 51 TTTTAAATGC AATCCGTTTA GTCAATGAAA ATGCGTCTGG CGTTTTA.TTG 
1501 ATTGTGAAGA ACTACACAGG TGATGTTTTG CATTTTGGTC TGTCCGCTGA 
1551 GAGAGCAAGA GCCTTGGGTA TTAACTGCCG CGTTGCTGTC ATAGGTGATG 
1601 ATGTTGCAGT TGGCAGAGAA AAGGGTGGTA TGGTTGGTAG AAGAGCATTG 
1651 GCAGGTACCG TTTTGGTTCA TAAGATTGTA GGTGCCTTCG CAGAAGAATA 
17 01 TTCTAGTAAG TATGGCTTAG ACGGTACAGC TAAAGTGGCT AAAATTATCA 
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17 51 ACGACAATTT GGTGACCATT 

18 01 GGCAGGAAAT TCGAAAGTGA 
18 51 GGGTATTCAT AACGAACCTG 
1901 CCGAAGACTT GATCTCCAAG 
1951 GATAAGGATA GAGCTTTTGT 
2 001 GTTAGTTAAC AATCTCGGCG 
2051 CTTCCAAAAC TACGGATTTC 
2101 CAAACAATTG CTGGCACATT 
2151 TATCACATTA CTAAACGCCA 
22 01 TTGAGGAGAT CAAATCAGTA 
2251 CCGGGCTGGC CAATTGCAGA 
2 301 CGATGACTTG TTACATAATG 
2351 ACTTTGACAA GTTTGCTGAG 
2401 AAGAGCGAAC CGCACATTAC 
2451 TTGTGGTTAC ACTTTAGTGG 
2501 ACAAGCTGTC GAAGGACTCA 
2 551 TTCATTGAAG GCTCAATGGG 
2 601 TTTGTCGGGT TTTTCACACG 
2 651 AACCCGTCAC TAAGGAAATT 

27 01 ACTTTATACA AATATACAAA 
2751 TGCTTTAGAA CCATTCGTTA 
2801 AGGCGGTAAA AGCTGCAGAG 

28 51 GCCAAATTTG GCAGAGCTTC 
2901 TCCTGGTGCA GTAGGCCTAT 
2951 TGTAA 



GGATCTTCTT TAGACCATTG TAAAGTTCCT 
ATTAAACGAA AAACAAATGG AATTGGGTAT 
GTGTGAAAGT TTTAGACCCT ATTCCTTCTA 
TATATGCTAC CAAAACTATT GGATCCAAAC 
AAAGTTTGAT GAAGATGATG AAGTTGTCTT 
GTGTTTCTAA TTTTGTTATT AGTTCTATCA 
TTAAAGGAAA ATTACAACAT AACCCCGGTT 
GATGACCTCC TTCAATGGTA ATGGGTTCAG 
CTAAGGCTAC AAAGGCTTTG CAATCTGATT 
CTAGACTTGT TGAACGCATT TACGAACGCA 
TTTTGAAAAG ACTTCTGCCC CATCTGTTAA 
AAGTAACAGC AAAGGCCGTC GGTACCTATG 
TGGATGAAGA GTGGTGCTGA ACAAGTTATC 
GGAACTAGAC AATCAAGTTG GTGATGGTGA 
CAGGAGTTAA AGGCATCACC GAAAACCTTG 
TTATCTCAGG CGGTTGCCCA AATTTCAGAT 
AGGTACTTCT GGTGGTTTAT ATTCTATTCT 
GATTAATTCA GGTTTGTAAA TCAAAGGATG 
GTGGCTAAGT CACTCGGAAT TGCATTGGAT 
GGCAAGGAAG GGATCATCCA CCATGATTGA 
AAGAATTTAC TGCATCTAAG GATTTCAATA 
GAAGGTGCTA AATCCACTGC TACATTCGAG 
GTATGTCGGC GATTCATCTC AAGTAGAAGA 
GTGAGTTTTT GAAGGGGGTT CAAAGCGCCT 



FIGURE 66 (cont) . 
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FIGURE 67. DAK1 Protein Sequence 

Nature 387:90-93 [97313268] (1997) The nucleotide seque 
of Saccharomyces cerevisiae chromosome XIII. 

Bowman, S., Churcher, C, Badcock, K., Brown, D., 
Chillingworth, T., Connor, R., Dedman, K. , Devlin, K 
Gentles, S., Hamlin, N . , Hunt, S., Jagels, K., Lye, g' , 
Moule, S., Odell, C, Pearson, D., Rajandream, et al. 



YML070W Length: 584 March 31, 1999 09:58 Type: P Check' 
1 67 



1 


MSAKS FEVTD 


51 


GSGHEPTHAG 


101 


IVKNYTGDVL 


151 


AGTVLVHKIV 


201 


GRKFESELNE 


251 


DKDRAFVKFD 


301 


QTIAGTLMTS 


351 


PGWPIADFEK 


401 


KSEPHITELD 


451 


FIEGSMGGTS 


501 


TLYKYTKARK 


551 


AKFGRASYVG 
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FIGURE 68. PGU1 DNA Sequence 

DNA sequence includes 1200bp of 5' promoter sequence. 
Symbols: 1 to: 2286 from: chrlO.gcg ck: 4711, 

721304 to: 723589 
Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert, F., Alexandraki, D., Baur, A., Boles, E., 
Chalwatzis, N. , Chuat, J. C, Coster, F . , Cziepluch, C, De 
Haan, M. , Domdey, H., . . . 

gcgseq.tmp. 30022 Length: 2286 March 31, 1999 09:20 Type: 
N Check: 4618 .. 

1 ATGATTCTGA CGACCCTTTG ATAGTGGCAA TGATCAAAAA. GAAAAAAAAA 

51 AGATAAGACG GTAGTGTGAA GATGACATAT AGCGCTACTC TATACTCGTC 

101 CAACTTCGAA AATAATATGT GGTCGTTGGT ACGTTCAGAT AAGAGAATAC 

151 ATCTCGCGCG TACGCATAAT TGTGGTCTAA AAAACCGCTG AAATTTTCTC 

201 AATACTGAAT AGAATCACGC TACTACGACA AGACTCGGTT ACTGTGCCTA 

251 AAATAATCCT GTGATAAACG AGTTATGTTA AACGCAGTAC AGGGGTTAAA 

301 GGGCATTGAG TTTTTGTGAG TGGAAATGCC CCCGTTATAG CTTCCAGTTT 

351 AATTACAAAT TATCAATTTA AGCAAATATA ACTGGAGGAT TGGGGAGGCG 

4 01 ACTAAAAATG GCTACCACGC TATTAGACAT ACAACATTGA GTATTTTATG 

451 TAATTTTGTT ACTGCTAGCA CGGCCATGCA ATTGGCAACT GAAAGCTATC 

501 TGACAACTTA AATGATTCTT AAAACAATGA CGACTATAAT CTTCTCTAAG 

551 AAGTTTCATA TCCATCTTCC TCATTATTCA GTTTCTTTTT CCTCTTGAAA 

601 GTATCGTAAA GAACAACGTC TTCACATTAG CTATTAGAAG ACCATTGAAC 

651 TACCGGATAT GAGTAAGAGT GATCTTGCCG GAGAGATAAT AGCTGCACAA 

701 AGGCCAAGGA TTAGATTAAT GGGTGCATTG TACGAAAAAA AATAGTTTAC 

751 AGTCATTTAT TCGCAATAAA TCAATTTTTT TTTCAAAAAA. TATGTAAGTC 

801 TGATAAAAAA TTCTTCACTG AAGAGAGATG CTTACATTCT AATTCTTGAA 

851 TAAAAGACTC TCTAACGCTG TGAATTCTCT TTAGCTGTAA CGGAAACAGA 

901 GAGTTATTCC GTAGTCACTG AATTTTTTTT TTTTGACGCT AT TAT TT AAA 

951 ACCTAGGATA TCCGTCCCAT ACAAAACGGC CACGAGTTTC AATCCCAGAA 

1001 TGTACGAGTT ATAATTCTCC TAGATGCATG ATACTCGTGC ATTCGTTTAA 

1051 CAATCATACC AATTTCCCAT TTTCGGGATA TTAAACATGA ACATACTTTT 

1101 TTACTGTGAG AATGTGGTTT CACAATTATT CCATACAGGT ATAAAAACGC 

1151 ACAGAACTTC AAA.CGGGAAG ACTATCTACC CACATTGATG GACAAACGCA 

1201 ATGATTTCTG CTAATTCATT ACTTATTTCC ACTTTGTGCG CTTTTGCGAT 

1251 CGCAACACCT TTGTCAAAAA GAGATTCCTG TACCCTAACA GGATCTTCTT 

1301 TGTCTTCACT CTCAACCGTG AAAAAATGTA GCAGCATCGT TATTAAAGAC 

1351 TTAACTGTCC CAGCTGGACA GACTTTAGAT TTAACTGGGT TAAGCAGTGG 

14 01 TACTACTGTT ACGTTTGAAG GCACAACCAC ATTTCAGTAC AAGGAATGGA 

14 51 GCGGCCCTTT AATTTCAATC TCAGGGTCTA AAATCAGCGT TGTTGGTGCT 

1501 TCGGGACATA CCATTGATGG TCAAGGAGCA AAATGGTGGG ATGGCTTAGG 

1551 TGATAGCGGT AAAGTCAAAC CGAAGTTTGT AAAGTTGGCG TTGACGGGAA 

1601 CATCTAAGGT CACCGGATTG AATATTAAAA. ATGCTCCACA CCAAGTCTTC 

1651 AGCATCAATA AATGTTCAGA TTTAACCATC AGCGACATAA CAATTGATAT 

17 01 CAGAGACGGT GATTCGGCTG GTGGTCATAA TACGGATGGG TTTGATGTTG 

1751 GTAGTTCTAG TAACGTCTTA ATTCAAGGAT GTACTGTTTA TAATCAGGAT 

1801 GACTGTATTG CTGTGAATTC CGGTTCAACT ATTAAATTTA TGAACAACTA 
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1851 CTGCTACAAT GGCCATGGTA TTTCTGTAGG TTCTGTTGGT GGCCGTTCTG 

1901 ATAATACAGT CAATGGTTTC TGGGCTGAAA ATAACCATGT TATCAACTCT 

1951 GACAACGGGT TGAGAATAAA AACCGTAGAA GGTGCGACAG GCACAGTCAC 

2001 TAATGTCAAC TTTATCAGTA ATAAAATTAG CGGCATAAAA AGTTATGGTA 

2051 TTGTTATCGA AGGCGATTAT TTGAATAGTA AGACTACTGG AACTGCTACA 

2101 GGTGGCGTTC CCATTTCGAA TTTAGTAATG AAGGATATCA CCGGGAGCGT 

2151 GAACTCCACA GCGAA.GAGGG TTAAAATTTT GGTGAAAAAC GCTACTAACT 

2201 GGCAATGGTC TGGGGTGTCA ATTACCGGTG GTTCTTCCTA TTCTGGATGT 

2251 TCTGGAATCC CATCTGGATC TGGTGCAAGC TGTTAA 



FIGURE 68 (cont) . 
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FIGURE 69. PGU1 Protein Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 

Galibert, F. , Alexandraki, D., Baur, A., Boles, E., 
Chalwatzis, N., Chuat, J. C, Coster, F., Cziepluch, C, De 
Haan, M . , Domdey, H., Durand, P., Entian, K. D., Gatius, M. , 
Goffeau, A., Grivell, L. A., et al. 



YJR153W Length: 361 March 31, 1999 09:55 Type: P Check- 

9795 .. 

1 MISANSLLIS TLCAFAIATP LSKRDSCTLT GSSLSSLSTV KKCSSIVIKD 

51 LTVPAGQTLD LTGLSSGTTV TFEGTTTFQY KEWSGPLISI SGSKISWGA 

101 SGHTIDGQGA KWWDGLGDSG KVKPKFVKLA LTGTSKVTGL NIKNAPHQVF 

151 SINKCSDLTI SDITIDIRDG DSAGGHNTDG FDVGSSSNVL IQGCTVYNQD 

201 DCIAVNSGST IKFMNNYCYN GHGISVGSVG GRSDNTVNGF WAENNHVINS 

251 DNGLRIKTVE GATGTVTNVN FISNKISGIK SYGIVIEGDY LNSKTTGTAT 

301 GGVPISNLVM KDITGSVNST AKRVKILVKN ATNWQWSGVS ITGGSSYSGC 

351 SGIPSGSGAS C 



82/88 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 



PCT/US00/08604 



FIGURE 70. STE18 DNA Sequence 

This sequence contains 600bp of 5' promoter sequence. 
Symbols: 1 to: 933 from: chrlO.gcg ck: 4711, 

585156 to: 586088 

Chromosome X Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 
Galibert, F., Alexandraki, D., Baur, A., Boles, E., 
Chalwatzis, N., Chuat, J. C, Coster, F., Cziepluch, C, De 
Haan, M., Domdey, H., . . . 

gcgseq.tmp. 6719 Length: 933 March 31, 1999 10:01 Type: N 
Check: 8833 

1 TTCGTTTCTG TCTTGTCTCC CGCTGTTACC TAATAACTTC ATGTGATCTG 
51 CTCCCCCTTC TCGTTAAATA CCACCTTTTC ATCAACCCCG TAGGGCGCGA 
101 CACGTCTAAA ATATTAACCT CTGAATACTT ATTGGGTCAA AATGAATGTT 
151 GATAACTTTC CTTTACAAAA AAAAAACTAA TAGAGTATAT GCATTTCGGT 
201 AGTGAAATAT TCGTTAATGC TAATATGCTC AG TAG T GAT C CTAGATTACC 
251 AGTTTTACTG CAGCCATCGT ACAATTTTGG AAC GAG TATA AAGAGAGAAA 
301 TTAAAAACGA CAAGAAATAT TCGTACTAGC TTCTCTTCCG GCTTGATGAC 
351 AGTCTTAATA TCATCTGCAA CTCTTGAAAT CTTGCTTTAT AGTCAAAATT 
401 TACGTACGCT TTTCACTATA TAATATGATT TGTCAATGTG ATGAGTGAAT 
451 GTCTCCCTGT TACCCGGTTT TCATGTTGAT TTTTGTTTCA GGCTCTAAAT 
501 GTTTGATGCA ATATTTAACA AGGAGAACAG AAATGTTTTG TGACAGCACC 
551 TGTCAATTTT AGGATAGTAG CAATCGCAAA CGTTCTCAAT AATTCTAAGA 
601 AT G AC AT C AG TTCAAAACTC TCCACGCTTA CAACAACCTC AGGAACAGCA 
651 AC AG C AAC AG CAACAGCTTT CCTTAAAGAT AAAACAATTG AAGTTAAAAA 
7 01 GAATCAACGA ACTTAACAAT AAACTGAGGA AAGAACTCAG CCGTGAAAGA 
751 ATTACTGCTT CAAATGCATG TCTTACAATA ATAAACTATA CCTCGAATAC 
801 AAAAGATTAT ACATTACCAG AACTATGGGG CTACCCCGTA GCAGGATCAA 
851 ATCATTTTAT AGAGGGTTTG AAAAATGCTC AAAAAAATAG CCAAATGTCA 
901 AACTCAAATA GTGTTTGTTG TACGCTTATG TAA 
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FIGURE 71. STE18 Protein Sequence 

EMBO J. 15:2031-2049 [8641269] (1996) Complete nucleotide 
sequence of Saccharomyces cerevisiae chromosome X. 

Galibert, F., Alexandraki, D., Baur, A., Boles, E. f 
Chalwatzis, N., Chuat, J. C, Coster, F . , Cziepluch, C, De 
Haan, M . , Domdey, H . , Durand, P., Entian, K. D., Gatius, M., 
Goffeau, A., Grivell, L. A., et al. 



YJR08 6W Length: 110 March 
6859 . . 

1 MTSVQNSPRL QQPQEQQQQQ 
51 ITASNACLTI INYTSNTKDY 
101 NSNSVCCTLM 



31, 1999 10:02 Type: P Check: 

QQLSLKIKQL KLKRINELNN KLRKELSRER 
TLPELWGYPV AGSNHFIEGL KNAQKNSQMS 
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FIGURE 72. YGL198w DNA Sequence 

This sequence contains 989bp of 5' promoter sequence. 
Symbols: 1 to: 1775 from: chr7.gcg ck: 9962, 

122605 to: 124379 

Chromosome VII Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome VII. 
Tettelin, H., Agostoni Carbone, M. L., Albermann, K., 
Albers, M. , Arroyo, J., Backes, U., Barreiros, T . , Bertani, 
I., Bjourson, A. J., . . 

gcgseq. tmp. 32650 Length: 1775 March 31, 1999 10:03 Type: 
N Check: 2850 .. 

1 GAGAATTATT CGCGACTTCA GGTTATCCAA TCGTGTATGT AATCGTATGT 

51 AGGCAAAAGT AAATAGATAT GAACTACATT TTCCTGCTTT ACTTAGACTA 

101 GAGATGTGAC CTCAAAGAAT CTTCTCAAGT AGTATATCTG GAAAAGAGAG 

151 TTTGCAATAA CGACGCCCAA TTGGAAGATG GACCACCATT TAACACGATC 

2 01 GTTGGTCGAC TCTGCAGTAT TTCTATGCGT CCTTTCTCTA ATAACAATAT 

2 51 AACTTTGTTC GTCCTTGACT TCCCTGGTTA ATTTGGACAA CTTTCTGACA 

301 GCACTATCCA ATGTATTGGT GTTTGGGTCG TCCAAATCCA CATATACCAC 

351 CCCATGAATG TTGAAAGTCA CGTCTTTTGT CTCGATACCG GTGTTCTCGT 

4 01 TCAAGAAACA GTATTGGAAA TGTCCCTTGT ATGGAGCAGA CAATGTGATT 

4 51 TCACCGTGCG ACGTGTCCCT AACCGTTTTC AAAACTTCAT GTCTTTCCGG 

501 CCCGTAGATG ATAAAGTCAC CAGTCAGCTG GCTACTGGAT TGAGGGTTTC 

551 TATCACCGAA CTGGAACGAA ATGGAGAGCT CGTCACCCTT ACTCAAGTCT 

601 TCGAAGAAGC ATCTACGGCC ATAAGCTGGA AGAAGGACAT TATGGGCGGA 

651 CGCCGAGAAG AACAGGAAGC AAGCAATGAC AAACTTAGTA GCAAATGAGG 

7 01 CCATCCTTAT GCGTGTGTAT TTTTGTGCGG AGGGATACTA TTAAGATTGC 

7 51 AGTTTCACCA AGTATAGCTT TTTATTTCAT TATAAGTTTC GTGTCAAAAT 

8 01 GTTTAAGCGA CCCGATCTCT CAGGCTGTTT TGCACGACTT TTCTGACTTT 
851 CCTCGCGTCT TTTTTCATGA AAATTGGATT ACCCGGAGTG ATGATTTTCT 
901 CACAGTGATT TTTCGTCCCC TTTTACAATA GCAAATGAAG CTGTTTTAGC 
951 AATATTTGTA GAAAGATATG TCACAAGAGG GCAGGCAAAA TGTCATACGG 

1001 AAGAGAAGAC ACTACGATTG AGCCCGACTT CATAGAACCA GATGCACCTT 

1051 TGGCTGCTTC CGGGGGTGTT GCTGACAACA TAGGCGGAAC TATGCAGAAT 

1101 TCAGGCAGCA GAGGGACGCT CGACGAGACT GTGCTGCAAA CACTAAAGCG 

1151 AGATGTGGTG GAGATTAA.TT CCAGACTGAA ACAAGTGGTA TACCCGCATT 

12 01 TCCCCTCATT CTTTAGCCCC TCTGATGACG GGATAGGGGC GGCTGATAAC 

1251 GACATTTCAG CCAATTGCGA CCTGTGGGCG CCCCTTGCGT T TAT CAT ATT 

1301 GTATTCTCTA TTTGTATCGC ATGCGCGGTC GCTGTTCTCG AGCCTATTTG 

1351 TGTCTAGTTG GTTCATTTTG CTGGTGATGG CATTGCATCT GAGACTCACC 

1401 AAGCCACACC AGAGGGTGTC GCTGATTTCG TACATCTCCA TTTCCGGGTA 

1451 TTGCTTATTC CCACAAGTGC TGAATGCCTT AGTCTCGCAG ATACTACTTC 

1501 CATTGGCCTA CCATATTGGA AAGCAAAATC GCTGGATTGT GAGGGTCCTG 

1551 TCGCTCGTGA AA.CTGGTGGT CATGGCGCTG TGCCTGATGT GGTCTGTGGC 

1601 CGCCGTTTCG TGGGTTACCA AGAGCAAGAC CATTATCGAG ATATACCTCT 

1651 GGCACTCTGT CTTTTTTGGC ATGGCTGGTT GTCAACTATT TTATAACACT 

17 01 AGTTACATAT GTATAAAACC CAATATTCAT GGACATAGAA TTGCCTATCT 

1751 CGCGAGCCAC GGCAGAAAGT TCTGA 
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FIGURE 73. YGL198v Protein Sequence 

Nature 387:81-84 [97313265] (1997) The nucleotide sequence 
of Saccharomyces cerevisiae chromosome VII. 

Tettelin, H., Agostoni Carbone, M. L . , Albermann, K., 
Albers, M., Arroyo, J., Backes, U., Barreiros, T., Bertani, 
I., Bjourson, A. J., Bruckner, M. , Bruschi, C. V., 
Carignani, G., Castagnoli, L., Cerdan, et al. 



YGL198W Length: 261 March 

1705 .. 

1 MSYGREDTTI EPDFIEPDAP 

51 TLKRDWEIN SRLKQWYPH 

101 FIILYSLFVS HARSLFSSLF 

151 ISGYCLFPQV LNALVSQILL 

2 01 WSVAAVSWVT KSKTIIEIYL 

251 IAYLASHGRK F 



31, 1999 10:05 Type: P Check: 



LAASGGVADN IGGTMQNSGS RGTLDETVLQ 
FPSFFSPSDD GIGAADNDIS ANCDLWAPLA 
VSSWFILLVM ALHLRLTKPH QRVSLISYIS 
PLAYHIGKQN RWIVRVLSLV KLWMALCLM 
WHSVFFGMAG CQLFYNTSYI CIKPNIHGHR 



86/88 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 



PCT/US00/08604 




87/88 



SUBSTITUTE SHEET (RULE 26) 



WO 00/58521 



PCT/US00/08604 




88/88 

SUBSTITUTE SHEET (RULE 26) 



